2019-05-27 14:55:01 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2017-08-05 18:38:26 +08:00
|
|
|
/*
|
|
|
|
* SR-IPv6 implementation
|
|
|
|
*
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
* Authors:
|
2017-08-05 18:38:26 +08:00
|
|
|
* David Lebrun <david.lebrun@uclouvain.be>
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
* eBPF support: Mathieu Xhonneux <m.xhonneux@gmail.com>
|
2017-08-05 18:38:26 +08:00
|
|
|
*/
|
|
|
|
|
2021-12-29 08:49:13 +08:00
|
|
|
#include <linux/filter.h>
|
2017-08-05 18:38:26 +08:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/skbuff.h>
|
|
|
|
#include <linux/net.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <net/ip.h>
|
|
|
|
#include <net/lwtunnel.h>
|
|
|
|
#include <net/netevent.h>
|
|
|
|
#include <net/netns/generic.h>
|
|
|
|
#include <net/ip6_fib.h>
|
|
|
|
#include <net/route.h>
|
|
|
|
#include <net/seg6.h>
|
|
|
|
#include <linux/seg6.h>
|
|
|
|
#include <linux/seg6_local.h>
|
|
|
|
#include <net/addrconf.h>
|
|
|
|
#include <net/ip6_route.h>
|
|
|
|
#include <net/dst_cache.h>
|
2020-01-20 12:48:37 +08:00
|
|
|
#include <net/ip_tunnels.h>
|
2017-08-05 18:38:26 +08:00
|
|
|
#ifdef CONFIG_IPV6_SEG6_HMAC
|
|
|
|
#include <net/seg6_hmac.h>
|
|
|
|
#endif
|
2018-05-20 21:58:13 +08:00
|
|
|
#include <net/seg6_local.h>
|
2017-08-25 15:58:17 +08:00
|
|
|
#include <linux/etherdevice.h>
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
#include <linux/bpf.h>
|
2021-08-17 16:39:37 +08:00
|
|
|
#include <linux/netfilter.h>
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
#define SEG6_F_ATTR(i) BIT(i)
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
struct seg6_local_lwt;
|
|
|
|
|
2020-12-02 21:05:13 +08:00
|
|
|
/* callbacks used for customizing the creation and destruction of a behavior */
|
|
|
|
struct seg6_local_lwtunnel_ops {
|
|
|
|
int (*build_state)(struct seg6_local_lwt *slwt, const void *cfg,
|
|
|
|
struct netlink_ext_ack *extack);
|
|
|
|
void (*destroy_state)(struct seg6_local_lwt *slwt);
|
|
|
|
};
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
struct seg6_action_desc {
|
|
|
|
int action;
|
|
|
|
unsigned long attrs;
|
2020-12-02 21:05:12 +08:00
|
|
|
|
|
|
|
/* The optattrs field is used for specifying all the optional
|
|
|
|
* attributes supported by a specific behavior.
|
|
|
|
* It means that if one of these attributes is not provided in the
|
|
|
|
* netlink message during the behavior creation, no errors will be
|
|
|
|
* returned to the userspace.
|
|
|
|
*
|
|
|
|
* Each attribute can be only of two types (mutually exclusive):
|
|
|
|
* 1) required or 2) optional.
|
|
|
|
* Every user MUST obey to this rule! If you set an attribute as
|
|
|
|
* required the same attribute CANNOT be set as optional and vice
|
|
|
|
* versa.
|
|
|
|
*/
|
|
|
|
unsigned long optattrs;
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
int (*input)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
|
|
|
|
int static_headroom;
|
2020-12-02 21:05:13 +08:00
|
|
|
|
|
|
|
struct seg6_local_lwtunnel_ops slwt_ops;
|
2017-08-05 18:38:26 +08:00
|
|
|
};
|
|
|
|
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
struct bpf_lwt_prog {
|
|
|
|
struct bpf_prog *prog;
|
|
|
|
char *name;
|
|
|
|
};
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
/* default length values (expressed in bits) for both Locator-Block and
|
|
|
|
* Locator-Node Function.
|
|
|
|
*
|
|
|
|
* Both SEG6_LOCAL_LCBLOCK_DBITS and SEG6_LOCAL_LCNODE_FN_DBITS *must* be:
|
|
|
|
* i) greater than 0;
|
|
|
|
* ii) evenly divisible by 8. In other terms, the lengths of the
|
|
|
|
* Locator-Block and Locator-Node Function must be byte-aligned (we can
|
|
|
|
* relax this constraint in the future if really needed).
|
|
|
|
*
|
|
|
|
* Moreover, a third condition must hold:
|
|
|
|
* iii) SEG6_LOCAL_LCBLOCK_DBITS + SEG6_LOCAL_LCNODE_FN_DBITS <= 128.
|
|
|
|
*
|
|
|
|
* The correctness of SEG6_LOCAL_LCBLOCK_DBITS and SEG6_LOCAL_LCNODE_FN_DBITS
|
|
|
|
* values are checked during the kernel compilation. If the compilation stops,
|
|
|
|
* check the value of these parameters to see if they meet conditions (i), (ii)
|
|
|
|
* and (iii).
|
|
|
|
*/
|
|
|
|
#define SEG6_LOCAL_LCBLOCK_DBITS 32
|
|
|
|
#define SEG6_LOCAL_LCNODE_FN_DBITS 16
|
|
|
|
|
|
|
|
/* The following next_csid_chk_{cntr,lcblock,lcblock_fn}_bits macros can be
|
|
|
|
* used directly to check whether the lengths (in bits) of Locator-Block and
|
|
|
|
* Locator-Node Function are valid according to (i), (ii), (iii).
|
|
|
|
*/
|
|
|
|
#define next_csid_chk_cntr_bits(blen, flen) \
|
|
|
|
((blen) + (flen) > 128)
|
|
|
|
|
|
|
|
#define next_csid_chk_lcblock_bits(blen) \
|
|
|
|
({ \
|
|
|
|
typeof(blen) __tmp = blen; \
|
|
|
|
(!__tmp || __tmp > 120 || (__tmp & 0x07)); \
|
|
|
|
})
|
|
|
|
|
|
|
|
#define next_csid_chk_lcnode_fn_bits(flen) \
|
|
|
|
next_csid_chk_lcblock_bits(flen)
|
|
|
|
|
2023-08-13 02:09:25 +08:00
|
|
|
/* flag indicating that flavors are set up for a given End* behavior */
|
|
|
|
#define SEG6_F_LOCAL_FLAVORS SEG6_F_ATTR(SEG6_LOCAL_FLAVORS)
|
|
|
|
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
#define SEG6_F_LOCAL_FLV_OP(flvname) BIT(SEG6_LOCAL_FLV_OP_##flvname)
|
2023-08-13 02:09:25 +08:00
|
|
|
#define SEG6_F_LOCAL_FLV_NEXT_CSID SEG6_F_LOCAL_FLV_OP(NEXT_CSID)
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
#define SEG6_F_LOCAL_FLV_PSP SEG6_F_LOCAL_FLV_OP(PSP)
|
|
|
|
|
|
|
|
/* Supported RFC8986 Flavor operations are reported in this bitmask */
|
|
|
|
#define SEG6_LOCAL_FLV8986_SUPP_OPS SEG6_F_LOCAL_FLV_PSP
|
|
|
|
|
2023-08-13 02:09:25 +08:00
|
|
|
#define SEG6_LOCAL_END_FLV_SUPP_OPS (SEG6_F_LOCAL_FLV_NEXT_CSID | \
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
SEG6_LOCAL_FLV8986_SUPP_OPS)
|
2023-08-13 02:09:25 +08:00
|
|
|
#define SEG6_LOCAL_END_X_FLV_SUPP_OPS SEG6_F_LOCAL_FLV_NEXT_CSID
|
2022-09-13 01:16:18 +08:00
|
|
|
|
|
|
|
struct seg6_flavors_info {
|
|
|
|
/* Flavor operations */
|
|
|
|
__u32 flv_ops;
|
|
|
|
|
|
|
|
/* Locator-Block length, expressed in bits */
|
|
|
|
__u8 lcblock_bits;
|
|
|
|
/* Locator-Node Function length, expressed in bits*/
|
|
|
|
__u8 lcnode_func_bits;
|
|
|
|
};
|
|
|
|
|
2020-12-02 21:05:14 +08:00
|
|
|
enum seg6_end_dt_mode {
|
|
|
|
DT_INVALID_MODE = -EINVAL,
|
|
|
|
DT_LEGACY_MODE = 0,
|
|
|
|
DT_VRF_MODE = 1,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct seg6_end_dt_info {
|
|
|
|
enum seg6_end_dt_mode mode;
|
|
|
|
|
|
|
|
struct net *net;
|
|
|
|
/* VRF device associated to the routing table used by the SRv6
|
|
|
|
* End.DT4/DT6 behavior for routing IPv4/IPv6 packets.
|
|
|
|
*/
|
|
|
|
int vrf_ifindex;
|
|
|
|
int vrf_table;
|
|
|
|
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
/* tunneled packet family (IPv4 or IPv6).
|
|
|
|
* Protocol and header length are inferred from family.
|
|
|
|
*/
|
2020-12-02 21:05:14 +08:00
|
|
|
u16 family;
|
|
|
|
};
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
struct pcpu_seg6_local_counters {
|
|
|
|
u64_stats_t packets;
|
|
|
|
u64_stats_t bytes;
|
|
|
|
u64_stats_t errors;
|
|
|
|
|
|
|
|
struct u64_stats_sync syncp;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* This struct groups all the SRv6 Behavior counters supported so far.
|
|
|
|
*
|
|
|
|
* put_nla_counters() makes use of this data structure to collect all counter
|
|
|
|
* values after the per-CPU counter evaluation has been performed.
|
|
|
|
* Finally, each counter value (in seg6_local_counters) is stored in the
|
|
|
|
* corresponding netlink attribute and sent to user space.
|
|
|
|
*
|
|
|
|
* NB: we don't want to expose this structure to user space!
|
|
|
|
*/
|
|
|
|
struct seg6_local_counters {
|
|
|
|
__u64 packets;
|
|
|
|
__u64 bytes;
|
|
|
|
__u64 errors;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define seg6_local_alloc_pcpu_counters(__gfp) \
|
|
|
|
__netdev_alloc_pcpu_stats(struct pcpu_seg6_local_counters, \
|
|
|
|
((__gfp) | __GFP_ZERO))
|
|
|
|
|
|
|
|
#define SEG6_F_LOCAL_COUNTERS SEG6_F_ATTR(SEG6_LOCAL_COUNTERS)
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
struct seg6_local_lwt {
|
|
|
|
int action;
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
int table;
|
|
|
|
struct in_addr nh4;
|
|
|
|
struct in6_addr nh6;
|
|
|
|
int iif;
|
|
|
|
int oif;
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
struct bpf_lwt_prog bpf;
|
2020-12-02 21:05:14 +08:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
struct seg6_end_dt_info dt_info;
|
|
|
|
#endif
|
2022-09-13 01:16:18 +08:00
|
|
|
struct seg6_flavors_info flv_info;
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
struct pcpu_seg6_local_counters __percpu *pcpu_counters;
|
2017-08-05 18:38:26 +08:00
|
|
|
|
|
|
|
int headroom;
|
|
|
|
struct seg6_action_desc *desc;
|
2020-12-02 21:05:12 +08:00
|
|
|
/* unlike the required attrs, we have to track the optional attributes
|
|
|
|
* that have been effectively parsed.
|
|
|
|
*/
|
|
|
|
unsigned long parsed_optattrs;
|
2017-08-05 18:38:26 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct seg6_local_lwt *seg6_local_lwtunnel(struct lwtunnel_state *lwt)
|
|
|
|
{
|
|
|
|
return (struct seg6_local_lwt *)lwt->data;
|
|
|
|
}
|
|
|
|
|
2017-08-05 18:39:48 +08:00
|
|
|
static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
|
2022-01-04 01:11:30 +08:00
|
|
|
srh = seg6_get_srh(skb, IP6_FH_F_SKIP_RH);
|
2017-08-05 18:39:48 +08:00
|
|
|
if (!srh)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
#ifdef CONFIG_IPV6_SEG6_HMAC
|
|
|
|
if (!seg6_hmac_validate_skb(skb))
|
|
|
|
return NULL;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
return srh;
|
|
|
|
}
|
|
|
|
|
2017-08-25 15:56:47 +08:00
|
|
|
static bool decap_and_validate(struct sk_buff *skb, int proto)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
unsigned int off = 0;
|
|
|
|
|
2022-01-04 01:11:30 +08:00
|
|
|
srh = seg6_get_srh(skb, 0);
|
2017-08-25 15:56:47 +08:00
|
|
|
if (srh && srh->segments_left > 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
#ifdef CONFIG_IPV6_SEG6_HMAC
|
|
|
|
if (srh && !seg6_hmac_validate_skb(skb))
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
if (ipv6_find_hdr(skb, &off, proto, NULL, NULL) < 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (!pskb_pull(skb, off))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
skb_postpull_rcsum(skb, skb_network_header(skb), off);
|
|
|
|
|
|
|
|
skb_reset_network_header(skb);
|
|
|
|
skb_reset_transport_header(skb);
|
2020-01-20 12:48:37 +08:00
|
|
|
if (iptunnel_pull_offloads(skb))
|
|
|
|
return false;
|
2017-08-25 15:56:47 +08:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void advance_nextseg(struct ipv6_sr_hdr *srh, struct in6_addr *daddr)
|
|
|
|
{
|
|
|
|
struct in6_addr *addr;
|
|
|
|
|
|
|
|
srh->segments_left--;
|
|
|
|
addr = srh->segments + srh->segments_left;
|
|
|
|
*daddr = *addr;
|
|
|
|
}
|
|
|
|
|
2019-11-23 00:22:42 +08:00
|
|
|
static int
|
|
|
|
seg6_lookup_any_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
|
|
|
|
u32 tbl_id, bool local_delivery)
|
2017-08-25 15:56:47 +08:00
|
|
|
{
|
|
|
|
struct net *net = dev_net(skb->dev);
|
|
|
|
struct ipv6hdr *hdr = ipv6_hdr(skb);
|
|
|
|
int flags = RT6_LOOKUP_F_HAS_SADDR;
|
|
|
|
struct dst_entry *dst = NULL;
|
|
|
|
struct rt6_info *rt;
|
|
|
|
struct flowi6 fl6;
|
2019-11-23 00:22:42 +08:00
|
|
|
int dev_flags = 0;
|
2017-08-25 15:56:47 +08:00
|
|
|
|
net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdev
Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif
reset for port devices") adds a new entry (flowi_l3mdev) in the common
flow struct used for indicating the l3mdev index for later rule and
table matching.
The l3mdev_update_flow() has been adapted to properly set the
flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid
flowi_iif is supplied to the l3mdev_update_flow(), this function can
update the flowi_l3mdev entry only if it has not yet been set (i.e., the
flowi_l3mdev entry is equal to 0).
The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to
force the routing lookup into the associated routing table. This routing
operation is performed by seg6_lookup_any_nextop() preparing a flowi6
data structure used by ip6_route_input_lookup() which, in turn,
(indirectly) invokes l3mdev_update_flow().
However, seg6_lookup_any_nexthop() does not initialize the new
flowi_l3mdev entry which is filled with random garbage data. This
prevents l3mdev_update_flow() from properly updating the flowi_l3mdev
with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are
broken.
This patch correctly initializes the flowi6 instance allocated and used
by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance
is wiped out: in case new entries are added to flowi/flowi6 (as happened
with the flowi_l3mdev entry), we should no longer have incorrectly
initialized values. As a result of this operation, the value of
flowi_l3mdev is also set to 0.
The proposed fix can be tested easily. Starting from the commit
referenced in the Fixes, selftests [1],[2] indicate that the SRv6
End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying
this patch, those behaviors are back to work properly again.
[1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
[2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: Anton Makarov <am@3a-alliance.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 17:19:17 +08:00
|
|
|
memset(&fl6, 0, sizeof(fl6));
|
2017-08-25 15:56:47 +08:00
|
|
|
fl6.flowi6_iif = skb->dev->ifindex;
|
|
|
|
fl6.daddr = nhaddr ? *nhaddr : hdr->daddr;
|
|
|
|
fl6.saddr = hdr->saddr;
|
|
|
|
fl6.flowlabel = ip6_flowinfo(hdr);
|
|
|
|
fl6.flowi6_mark = skb->mark;
|
|
|
|
fl6.flowi6_proto = hdr->nexthdr;
|
|
|
|
|
|
|
|
if (nhaddr)
|
|
|
|
fl6.flowi6_flags = FLOWI_FLAG_KNOWN_NH;
|
|
|
|
|
|
|
|
if (!tbl_id) {
|
2018-03-03 00:32:17 +08:00
|
|
|
dst = ip6_route_input_lookup(net, skb->dev, &fl6, skb, flags);
|
2017-08-25 15:56:47 +08:00
|
|
|
} else {
|
|
|
|
struct fib6_table *table;
|
|
|
|
|
|
|
|
table = fib6_get_table(net, tbl_id);
|
|
|
|
if (!table)
|
|
|
|
goto out;
|
|
|
|
|
2018-03-03 00:32:17 +08:00
|
|
|
rt = ip6_pol_route(net, table, 0, &fl6, skb, flags);
|
2017-08-25 15:56:47 +08:00
|
|
|
dst = &rt->dst;
|
|
|
|
}
|
|
|
|
|
2019-11-23 00:22:42 +08:00
|
|
|
/* we want to discard traffic destined for local packet processing,
|
|
|
|
* if @local_delivery is set to false.
|
|
|
|
*/
|
|
|
|
if (!local_delivery)
|
|
|
|
dev_flags |= IFF_LOOPBACK;
|
|
|
|
|
|
|
|
if (dst && (dst->dev->flags & dev_flags) && !dst->error) {
|
2017-08-25 15:56:47 +08:00
|
|
|
dst_release(dst);
|
|
|
|
dst = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (!dst) {
|
|
|
|
rt = net->ipv6.ip6_blk_hole_entry;
|
|
|
|
dst = &rt->dst;
|
|
|
|
dst_hold(dst);
|
|
|
|
}
|
|
|
|
|
|
|
|
skb_dst_drop(skb);
|
|
|
|
skb_dst_set(skb, dst);
|
2018-05-20 21:58:13 +08:00
|
|
|
return dst->error;
|
2017-08-25 15:56:47 +08:00
|
|
|
}
|
|
|
|
|
2019-11-23 00:22:42 +08:00
|
|
|
int seg6_lookup_nexthop(struct sk_buff *skb,
|
|
|
|
struct in6_addr *nhaddr, u32 tbl_id)
|
|
|
|
{
|
|
|
|
return seg6_lookup_any_nexthop(skb, nhaddr, tbl_id, false);
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
static __u8 seg6_flv_lcblock_octects(const struct seg6_flavors_info *finfo)
|
|
|
|
{
|
|
|
|
return finfo->lcblock_bits >> 3;
|
|
|
|
}
|
|
|
|
|
|
|
|
static __u8 seg6_flv_lcnode_func_octects(const struct seg6_flavors_info *finfo)
|
|
|
|
{
|
|
|
|
return finfo->lcnode_func_bits >> 3;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool seg6_next_csid_is_arg_zero(const struct in6_addr *addr,
|
|
|
|
const struct seg6_flavors_info *finfo)
|
|
|
|
{
|
|
|
|
__u8 fnc_octects = seg6_flv_lcnode_func_octects(finfo);
|
|
|
|
__u8 blk_octects = seg6_flv_lcblock_octects(finfo);
|
|
|
|
__u8 arg_octects;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
arg_octects = 16 - blk_octects - fnc_octects;
|
|
|
|
for (i = 0; i < arg_octects; ++i) {
|
|
|
|
if (addr->s6_addr[blk_octects + fnc_octects + i] != 0x00)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* assume that DA.Argument length > 0 */
|
|
|
|
static void seg6_next_csid_advance_arg(struct in6_addr *addr,
|
|
|
|
const struct seg6_flavors_info *finfo)
|
|
|
|
{
|
|
|
|
__u8 fnc_octects = seg6_flv_lcnode_func_octects(finfo);
|
|
|
|
__u8 blk_octects = seg6_flv_lcblock_octects(finfo);
|
|
|
|
|
|
|
|
/* advance DA.Argument */
|
|
|
|
memmove(&addr->s6_addr[blk_octects],
|
|
|
|
&addr->s6_addr[blk_octects + fnc_octects],
|
|
|
|
16 - blk_octects - fnc_octects);
|
|
|
|
|
|
|
|
memset(&addr->s6_addr[16 - fnc_octects], 0x00, fnc_octects);
|
|
|
|
}
|
|
|
|
|
2023-02-15 21:46:57 +08:00
|
|
|
static int input_action_end_finish(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
seg6_lookup_nexthop(skb, NULL, 0);
|
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
static int input_action_end_core(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
2017-08-05 18:39:48 +08:00
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
|
|
|
|
srh = get_and_validate_srh(skb);
|
|
|
|
if (!srh)
|
|
|
|
goto drop;
|
|
|
|
|
2017-08-25 15:56:47 +08:00
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
2023-02-15 21:46:57 +08:00
|
|
|
return input_action_end_finish(skb, slwt);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
static int end_next_csid_core(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
const struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
struct in6_addr *daddr = &ipv6_hdr(skb)->daddr;
|
|
|
|
|
|
|
|
if (seg6_next_csid_is_arg_zero(daddr, finfo))
|
|
|
|
return input_action_end_core(skb, slwt);
|
|
|
|
|
|
|
|
/* update DA */
|
|
|
|
seg6_next_csid_advance_arg(daddr, finfo);
|
|
|
|
|
2023-02-15 21:46:57 +08:00
|
|
|
return input_action_end_finish(skb, slwt);
|
2022-09-13 01:16:18 +08:00
|
|
|
}
|
|
|
|
|
2023-08-13 02:09:25 +08:00
|
|
|
static int input_action_end_x_finish(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
seg6_lookup_nexthop(skb, &slwt->nh6, 0);
|
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int input_action_end_x_core(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
|
|
|
|
srh = get_and_validate_srh(skb);
|
|
|
|
if (!srh)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
|
|
|
|
|
|
|
return input_action_end_x_finish(skb, slwt);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int end_x_next_csid_core(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
const struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
struct in6_addr *daddr = &ipv6_hdr(skb)->daddr;
|
|
|
|
|
|
|
|
if (seg6_next_csid_is_arg_zero(daddr, finfo))
|
|
|
|
return input_action_end_x_core(skb, slwt);
|
|
|
|
|
|
|
|
/* update DA */
|
|
|
|
seg6_next_csid_advance_arg(daddr, finfo);
|
|
|
|
|
|
|
|
return input_action_end_x_finish(skb, slwt);
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
static bool seg6_next_csid_enabled(__u32 fops)
|
|
|
|
{
|
2023-08-13 02:09:25 +08:00
|
|
|
return fops & SEG6_F_LOCAL_FLV_NEXT_CSID;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Processing of SRv6 End, End.X, and End.T behaviors can be extended through
|
|
|
|
* the flavors framework. These behaviors must report the subset of (flavor)
|
|
|
|
* operations they currently implement. In this way, if a user specifies a
|
|
|
|
* flavor combination that is not supported by a given End* behavior, the
|
|
|
|
* kernel refuses to instantiate the tunnel reporting the error.
|
|
|
|
*/
|
|
|
|
static int seg6_flv_supp_ops_by_action(int action, __u32 *fops)
|
|
|
|
{
|
|
|
|
switch (action) {
|
|
|
|
case SEG6_LOCAL_ACTION_END:
|
|
|
|
*fops = SEG6_LOCAL_END_FLV_SUPP_OPS;
|
|
|
|
break;
|
|
|
|
case SEG6_LOCAL_ACTION_END_X:
|
|
|
|
*fops = SEG6_LOCAL_END_X_FLV_SUPP_OPS;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2022-09-13 01:16:18 +08:00
|
|
|
}
|
|
|
|
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
/* We describe the packet state in relation to the absence/presence of the SRH
|
|
|
|
* and the Segment Left (SL) field.
|
|
|
|
* For our purposes, it is not necessary to record the exact value of the SL
|
|
|
|
* when the SID List consists of two or more segments.
|
|
|
|
*/
|
|
|
|
enum seg6_local_pktinfo {
|
|
|
|
/* the order really matters! */
|
|
|
|
SEG6_LOCAL_PKTINFO_NOHDR = 0,
|
|
|
|
SEG6_LOCAL_PKTINFO_SL_ZERO,
|
|
|
|
SEG6_LOCAL_PKTINFO_SL_ONE,
|
|
|
|
SEG6_LOCAL_PKTINFO_SL_MORE,
|
|
|
|
__SEG6_LOCAL_PKTINFO_MAX,
|
|
|
|
};
|
|
|
|
|
|
|
|
#define SEG6_LOCAL_PKTINFO_MAX (__SEG6_LOCAL_PKTINFO_MAX - 1)
|
|
|
|
|
|
|
|
static enum seg6_local_pktinfo seg6_get_srh_pktinfo(struct ipv6_sr_hdr *srh)
|
|
|
|
{
|
|
|
|
__u8 sgl;
|
|
|
|
|
|
|
|
if (!srh)
|
|
|
|
return SEG6_LOCAL_PKTINFO_NOHDR;
|
|
|
|
|
|
|
|
sgl = srh->segments_left;
|
|
|
|
if (sgl < 2)
|
|
|
|
return SEG6_LOCAL_PKTINFO_SL_ZERO + sgl;
|
|
|
|
|
|
|
|
return SEG6_LOCAL_PKTINFO_SL_MORE;
|
|
|
|
}
|
|
|
|
|
|
|
|
enum seg6_local_flv_action {
|
|
|
|
SEG6_LOCAL_FLV_ACT_UNSPEC = 0,
|
|
|
|
SEG6_LOCAL_FLV_ACT_END,
|
|
|
|
SEG6_LOCAL_FLV_ACT_PSP,
|
|
|
|
SEG6_LOCAL_FLV_ACT_USP,
|
|
|
|
SEG6_LOCAL_FLV_ACT_USD,
|
|
|
|
__SEG6_LOCAL_FLV_ACT_MAX
|
|
|
|
};
|
|
|
|
|
|
|
|
#define SEG6_LOCAL_FLV_ACT_MAX (__SEG6_LOCAL_FLV_ACT_MAX - 1)
|
|
|
|
|
|
|
|
/* The action table for RFC8986 flavors (see the flv8986_act_tbl below)
|
|
|
|
* contains the actions (i.e. processing operations) to be applied on packets
|
|
|
|
* when flavors are configured for an End* behavior.
|
|
|
|
* By combining the pkinfo data and from the flavors mask, the macro
|
|
|
|
* computes the index used to access the elements (actions) stored in the
|
|
|
|
* action table. The index is structured as follows:
|
|
|
|
*
|
|
|
|
* index
|
|
|
|
* _______________/\________________
|
|
|
|
* / \
|
|
|
|
* +----------------+----------------+
|
|
|
|
* | pf | afm |
|
|
|
|
* +----------------+----------------+
|
|
|
|
* ph-1 ... p1 p0 fk-1 ... f1 f0
|
|
|
|
* MSB LSB
|
|
|
|
*
|
|
|
|
* where:
|
|
|
|
* - 'afm' (adjusted flavor mask) is the mask containing a combination of the
|
|
|
|
* RFC8986 flavors currently supported. 'afm' corresponds to the @fm
|
|
|
|
* argument of the macro whose value is righ-shifted by 1 bit. By doing so,
|
|
|
|
* we discard the SEG6_LOCAL_FLV_OP_UNSPEC flag (bit 0 in @fm) which is
|
|
|
|
* never used here;
|
|
|
|
* - 'pf' encodes the packet info (pktinfo) regarding the presence/absence of
|
|
|
|
* the SRH, SL = 0, etc. 'pf' is set with the value of @pf provided as
|
|
|
|
* argument to the macro.
|
|
|
|
*/
|
|
|
|
#define flv8986_act_tbl_idx(pf, fm) \
|
|
|
|
((((pf) << bits_per(SEG6_LOCAL_FLV8986_SUPP_OPS)) | \
|
|
|
|
((fm) & SEG6_LOCAL_FLV8986_SUPP_OPS)) >> SEG6_LOCAL_FLV_OP_PSP)
|
|
|
|
|
|
|
|
/* We compute the size of the action table by considering the RFC8986 flavors
|
|
|
|
* actually supported by the kernel. In this way, the size is automatically
|
|
|
|
* adjusted when new flavors are supported.
|
|
|
|
*/
|
|
|
|
#define FLV8986_ACT_TBL_SIZE \
|
|
|
|
roundup_pow_of_two(flv8986_act_tbl_idx(SEG6_LOCAL_PKTINFO_MAX, \
|
|
|
|
SEG6_LOCAL_FLV8986_SUPP_OPS))
|
|
|
|
|
|
|
|
/* tbl_cfg(act, pf, fm) macro is used to easily configure the action
|
|
|
|
* table; it accepts 3 arguments:
|
|
|
|
* i) @act, the suffix from SEG6_LOCAL_FLV_ACT_{act} representing
|
|
|
|
* the action that should be applied on the packet;
|
|
|
|
* ii) @pf, the suffix from SEG6_LOCAL_PKTINFO_{pf} reporting the packet
|
|
|
|
* info about the lack/presence of SRH, SRH with SL = 0, etc;
|
|
|
|
* iii) @fm, the mask of flavors.
|
|
|
|
*/
|
|
|
|
#define tbl_cfg(act, pf, fm) \
|
|
|
|
[flv8986_act_tbl_idx(SEG6_LOCAL_PKTINFO_##pf, \
|
|
|
|
(fm))] = SEG6_LOCAL_FLV_ACT_##act
|
|
|
|
|
|
|
|
/* shorthand for improving readability */
|
|
|
|
#define F_PSP SEG6_F_LOCAL_FLV_PSP
|
|
|
|
|
|
|
|
/* The table contains, for each combination of the pktinfo data and
|
|
|
|
* flavors, the action that should be taken on a packet (e.g.
|
|
|
|
* "standard" Endpoint processing, Penultimate Segment Pop, etc).
|
|
|
|
*
|
|
|
|
* By default, table entries not explicitly configured are initialized with the
|
|
|
|
* SEG6_LOCAL_FLV_ACT_UNSPEC action, which generally has the effect of
|
|
|
|
* discarding the processed packet.
|
|
|
|
*/
|
|
|
|
static const u8 flv8986_act_tbl[FLV8986_ACT_TBL_SIZE] = {
|
|
|
|
/* PSP variant for packet where SRH with SL = 1 */
|
|
|
|
tbl_cfg(PSP, SL_ONE, F_PSP),
|
|
|
|
/* End for packet where the SRH with SL > 1*/
|
|
|
|
tbl_cfg(END, SL_MORE, F_PSP),
|
|
|
|
};
|
|
|
|
|
|
|
|
#undef F_PSP
|
|
|
|
#undef tbl_cfg
|
|
|
|
|
|
|
|
/* For each flavor defined in RFC8986 (or a combination of them) an action is
|
|
|
|
* performed on the packet. The specific action depends on:
|
|
|
|
* - info extracted from the packet (i.e. pktinfo data) regarding the
|
|
|
|
* lack/presence of the SRH, and if the SRH is available, on the value of
|
|
|
|
* Segment Left field;
|
|
|
|
* - the mask of flavors configured for the specific SRv6 End* behavior.
|
|
|
|
*
|
|
|
|
* The function combines both the pkinfo and the flavors mask to evaluate the
|
|
|
|
* corresponding action to be taken on the packet.
|
|
|
|
*/
|
|
|
|
static enum seg6_local_flv_action
|
|
|
|
seg6_local_flv8986_act_lookup(enum seg6_local_pktinfo pinfo, __u32 flvmask)
|
|
|
|
{
|
|
|
|
unsigned long index;
|
|
|
|
|
|
|
|
/* check if the provided mask of flavors is supported */
|
|
|
|
if (unlikely(flvmask & ~SEG6_LOCAL_FLV8986_SUPP_OPS))
|
|
|
|
return SEG6_LOCAL_FLV_ACT_UNSPEC;
|
|
|
|
|
|
|
|
index = flv8986_act_tbl_idx(pinfo, flvmask);
|
|
|
|
if (unlikely(index >= FLV8986_ACT_TBL_SIZE))
|
|
|
|
return SEG6_LOCAL_FLV_ACT_UNSPEC;
|
|
|
|
|
|
|
|
return flv8986_act_tbl[index];
|
|
|
|
}
|
|
|
|
|
|
|
|
/* skb->data must be aligned with skb->network_header */
|
|
|
|
static bool seg6_pop_srh(struct sk_buff *skb, int srhoff)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
struct ipv6hdr *iph;
|
|
|
|
__u8 srh_nexthdr;
|
|
|
|
int thoff = -1;
|
|
|
|
int srhlen;
|
|
|
|
int nhlen;
|
|
|
|
|
|
|
|
if (unlikely(srhoff < sizeof(*iph) ||
|
|
|
|
!pskb_may_pull(skb, srhoff + sizeof(*srh))))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
|
|
|
|
srhlen = ipv6_optlen(srh);
|
|
|
|
|
|
|
|
/* we are about to mangle the pkt, let's check if we can write on it */
|
|
|
|
if (unlikely(skb_ensure_writable(skb, srhoff + srhlen)))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* skb_ensure_writable() may change skb pointers; evaluate srh again */
|
|
|
|
srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
|
|
|
|
srh_nexthdr = srh->nexthdr;
|
|
|
|
|
|
|
|
if (unlikely(!skb_transport_header_was_set(skb)))
|
|
|
|
goto pull;
|
|
|
|
|
|
|
|
nhlen = skb_network_header_len(skb);
|
|
|
|
/* we have to deal with the transport header: it could be set before
|
|
|
|
* the SRH, after the SRH, or within it (which is considered wrong,
|
|
|
|
* however).
|
|
|
|
*/
|
|
|
|
if (likely(nhlen <= srhoff))
|
|
|
|
thoff = nhlen;
|
|
|
|
else if (nhlen >= srhoff + srhlen)
|
|
|
|
/* transport_header is set after the SRH */
|
|
|
|
thoff = nhlen - srhlen;
|
|
|
|
else
|
|
|
|
/* transport_header falls inside the SRH; hence, we can't
|
|
|
|
* restore the transport_header pointer properly after
|
|
|
|
* SRH removing operation.
|
|
|
|
*/
|
|
|
|
return false;
|
|
|
|
pull:
|
|
|
|
/* we need to pop the SRH:
|
|
|
|
* 1) first of all, we pull out everything from IPv6 header up to SRH
|
|
|
|
* (included) evaluating also the rcsum;
|
|
|
|
* 2) we overwrite (and then remove) the SRH by properly moving the
|
|
|
|
* IPv6 along with any extension header that precedes the SRH;
|
|
|
|
* 3) At the end, we push back the pulled headers (except for SRH,
|
|
|
|
* obviously).
|
|
|
|
*/
|
|
|
|
skb_pull_rcsum(skb, srhoff + srhlen);
|
|
|
|
memmove(skb_network_header(skb) + srhlen, skb_network_header(skb),
|
|
|
|
srhoff);
|
|
|
|
skb_push(skb, srhoff);
|
|
|
|
|
|
|
|
skb_reset_network_header(skb);
|
|
|
|
skb_mac_header_rebuild(skb);
|
|
|
|
if (likely(thoff >= 0))
|
|
|
|
skb_set_transport_header(skb, thoff);
|
|
|
|
|
|
|
|
iph = ipv6_hdr(skb);
|
|
|
|
if (iph->nexthdr == NEXTHDR_ROUTING) {
|
|
|
|
iph->nexthdr = srh_nexthdr;
|
|
|
|
} else {
|
|
|
|
/* we must look for the extension header (EXTH, for short) that
|
|
|
|
* immediately precedes the SRH we have just removed.
|
|
|
|
* Then, we update the value of the EXTH nexthdr with the one
|
|
|
|
* contained in the SRH nexthdr.
|
|
|
|
*/
|
|
|
|
unsigned int off = sizeof(*iph);
|
|
|
|
struct ipv6_opt_hdr *hp, _hdr;
|
|
|
|
__u8 nexthdr = iph->nexthdr;
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
if (unlikely(!ipv6_ext_hdr(nexthdr) ||
|
|
|
|
nexthdr == NEXTHDR_NONE))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
hp = skb_header_pointer(skb, off, sizeof(_hdr), &_hdr);
|
|
|
|
if (unlikely(!hp))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (hp->nexthdr == NEXTHDR_ROUTING) {
|
|
|
|
hp->nexthdr = srh_nexthdr;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (nexthdr) {
|
|
|
|
case NEXTHDR_FRAGMENT:
|
|
|
|
fallthrough;
|
|
|
|
case NEXTHDR_AUTH:
|
|
|
|
/* we expect SRH before FRAG and AUTH */
|
|
|
|
return false;
|
|
|
|
default:
|
|
|
|
off += ipv6_optlen(hp);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
nexthdr = hp->nexthdr;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
|
|
|
|
|
|
|
|
skb_postpush_rcsum(skb, iph, srhoff);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* process the packet on the basis of the RFC8986 flavors set for the given
|
|
|
|
* SRv6 End behavior instance.
|
|
|
|
*/
|
|
|
|
static int end_flv8986_core(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
const struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
enum seg6_local_flv_action action;
|
|
|
|
enum seg6_local_pktinfo pinfo;
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
__u32 flvmask;
|
|
|
|
int srhoff;
|
|
|
|
|
|
|
|
srh = seg6_get_srh(skb, 0);
|
|
|
|
srhoff = srh ? ((unsigned char *)srh - skb->data) : 0;
|
|
|
|
pinfo = seg6_get_srh_pktinfo(srh);
|
|
|
|
#ifdef CONFIG_IPV6_SEG6_HMAC
|
|
|
|
if (srh && !seg6_hmac_validate_skb(skb))
|
|
|
|
goto drop;
|
|
|
|
#endif
|
|
|
|
flvmask = finfo->flv_ops;
|
|
|
|
if (unlikely(flvmask & ~SEG6_LOCAL_FLV8986_SUPP_OPS)) {
|
|
|
|
pr_warn_once("seg6local: invalid RFC8986 flavors\n");
|
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* retrieve the action triggered by the combination of pktinfo data and
|
|
|
|
* the flavors mask.
|
|
|
|
*/
|
|
|
|
action = seg6_local_flv8986_act_lookup(pinfo, flvmask);
|
|
|
|
switch (action) {
|
|
|
|
case SEG6_LOCAL_FLV_ACT_END:
|
|
|
|
/* process the packet as the "standard" End behavior */
|
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
|
|
|
break;
|
|
|
|
case SEG6_LOCAL_FLV_ACT_PSP:
|
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
|
|
|
|
|
|
|
if (unlikely(!seg6_pop_srh(skb, srhoff)))
|
|
|
|
goto drop;
|
|
|
|
break;
|
|
|
|
case SEG6_LOCAL_FLV_ACT_UNSPEC:
|
|
|
|
fallthrough;
|
|
|
|
default:
|
|
|
|
/* by default, we drop the packet since we could not find a
|
|
|
|
* suitable action.
|
|
|
|
*/
|
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
|
|
|
return input_action_end_finish(skb, slwt);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
/* regular endpoint function */
|
|
|
|
static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
const struct seg6_flavors_info *finfo = &slwt->flv_info;
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
__u32 fops = finfo->flv_ops;
|
2022-09-13 01:16:18 +08:00
|
|
|
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
if (!fops)
|
|
|
|
return input_action_end_core(skb, slwt);
|
|
|
|
|
|
|
|
/* check for the presence of NEXT-C-SID since it applies first */
|
|
|
|
if (seg6_next_csid_enabled(fops))
|
2022-09-13 01:16:18 +08:00
|
|
|
return end_next_csid_core(skb, slwt);
|
|
|
|
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
/* the specific processing function to be performed on the packet
|
|
|
|
* depends on the combination of flavors defined in RFC8986 and some
|
|
|
|
* information extracted from the packet, e.g. presence/absence of SRH,
|
|
|
|
* Segment Left = 0, etc.
|
|
|
|
*/
|
|
|
|
return end_flv8986_core(skb, slwt);
|
2022-09-13 01:16:18 +08:00
|
|
|
}
|
|
|
|
|
2017-08-05 18:39:48 +08:00
|
|
|
/* regular endpoint, and forward to specified nexthop */
|
|
|
|
static int input_action_end_x(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
2023-08-13 02:09:25 +08:00
|
|
|
const struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
__u32 fops = finfo->flv_ops;
|
2017-08-05 18:39:48 +08:00
|
|
|
|
2023-08-13 02:09:25 +08:00
|
|
|
/* check for the presence of NEXT-C-SID since it applies first */
|
|
|
|
if (seg6_next_csid_enabled(fops))
|
|
|
|
return end_x_next_csid_core(skb, slwt);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
2023-08-13 02:09:25 +08:00
|
|
|
return input_action_end_x_core(skb, slwt);
|
2017-08-05 18:39:48 +08:00
|
|
|
}
|
|
|
|
|
2017-08-25 15:58:17 +08:00
|
|
|
static int input_action_end_t(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
|
|
|
|
srh = get_and_validate_srh(skb);
|
|
|
|
if (!srh)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
|
|
|
|
2018-05-20 21:58:13 +08:00
|
|
|
seg6_lookup_nexthop(skb, NULL, slwt->table);
|
2017-08-25 15:58:17 +08:00
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* decapsulate and forward inner L2 frame on specified interface */
|
|
|
|
static int input_action_end_dx2(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct net *net = dev_net(skb->dev);
|
|
|
|
struct net_device *odev;
|
|
|
|
struct ethhdr *eth;
|
|
|
|
|
2020-03-12 00:54:06 +08:00
|
|
|
if (!decap_and_validate(skb, IPPROTO_ETHERNET))
|
2017-08-25 15:58:17 +08:00
|
|
|
goto drop;
|
|
|
|
|
|
|
|
if (!pskb_may_pull(skb, ETH_HLEN))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb_reset_mac_header(skb);
|
|
|
|
eth = (struct ethhdr *)skb->data;
|
|
|
|
|
|
|
|
/* To determine the frame's protocol, we assume it is 802.3. This avoids
|
|
|
|
* a call to eth_type_trans(), which is not really relevant for our
|
|
|
|
* use case.
|
|
|
|
*/
|
|
|
|
if (!eth_proto_is_802_3(eth->h_proto))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
odev = dev_get_by_index_rcu(net, slwt->oif);
|
|
|
|
if (!odev)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
/* As we accept Ethernet frames, make sure the egress device is of
|
|
|
|
* the correct type.
|
|
|
|
*/
|
|
|
|
if (odev->type != ARPHRD_ETHER)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
if (!(odev->flags & IFF_UP) || !netif_carrier_ok(odev))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb_orphan(skb);
|
|
|
|
|
|
|
|
if (skb_warn_if_lro(skb))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb_forward_csum(skb);
|
|
|
|
|
|
|
|
if (skb->len - ETH_HLEN > odev->mtu)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb->dev = odev;
|
|
|
|
skb->protocol = eth->h_proto;
|
|
|
|
|
|
|
|
return dev_queue_xmit(skb);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
static int input_action_end_dx6_finish(struct net *net, struct sock *sk,
|
|
|
|
struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct dst_entry *orig_dst = skb_dst(skb);
|
|
|
|
struct in6_addr *nhaddr = NULL;
|
|
|
|
struct seg6_local_lwt *slwt;
|
|
|
|
|
|
|
|
slwt = seg6_local_lwtunnel(orig_dst->lwtstate);
|
|
|
|
|
|
|
|
/* The inner packet is not associated to any local interface,
|
|
|
|
* so we do not call netif_rx().
|
|
|
|
*
|
|
|
|
* If slwt->nh6 is set to ::, then lookup the nexthop for the
|
|
|
|
* inner packet's DA. Otherwise, use the specified nexthop.
|
|
|
|
*/
|
|
|
|
if (!ipv6_addr_any(&slwt->nh6))
|
|
|
|
nhaddr = &slwt->nh6;
|
|
|
|
|
|
|
|
seg6_lookup_nexthop(skb, nhaddr, 0);
|
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
}
|
|
|
|
|
2017-08-05 18:39:48 +08:00
|
|
|
/* decapsulate and forward to specified nexthop */
|
|
|
|
static int input_action_end_dx6(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
/* this function accepts IPv6 encapsulated packets, with either
|
|
|
|
* an SRH with SL=0, or no SRH.
|
|
|
|
*/
|
|
|
|
|
2017-08-25 15:56:47 +08:00
|
|
|
if (!decap_and_validate(skb, IPPROTO_IPV6))
|
2017-08-05 18:39:48 +08:00
|
|
|
goto drop;
|
|
|
|
|
2017-08-25 15:56:47 +08:00
|
|
|
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
|
2017-08-05 18:39:48 +08:00
|
|
|
goto drop;
|
|
|
|
|
2019-11-16 23:05:53 +08:00
|
|
|
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
|
2021-08-17 16:39:37 +08:00
|
|
|
nf_reset_ct(skb);
|
2019-11-16 23:05:53 +08:00
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
|
|
|
|
return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
|
2024-06-13 17:42:46 +08:00
|
|
|
dev_net(skb->dev), NULL, skb, skb->dev,
|
|
|
|
NULL, input_action_end_dx6_finish);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
return input_action_end_dx6_finish(dev_net(skb->dev), NULL, skb);
|
2017-08-05 18:39:48 +08:00
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
static int input_action_end_dx4_finish(struct net *net, struct sock *sk,
|
|
|
|
struct sk_buff *skb)
|
2017-08-25 15:58:17 +08:00
|
|
|
{
|
2021-08-17 16:39:37 +08:00
|
|
|
struct dst_entry *orig_dst = skb_dst(skb);
|
|
|
|
struct seg6_local_lwt *slwt;
|
2017-08-25 15:58:17 +08:00
|
|
|
struct iphdr *iph;
|
|
|
|
__be32 nhaddr;
|
|
|
|
int err;
|
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
slwt = seg6_local_lwtunnel(orig_dst->lwtstate);
|
2017-08-25 15:58:17 +08:00
|
|
|
|
|
|
|
iph = ip_hdr(skb);
|
|
|
|
|
|
|
|
nhaddr = slwt->nh4.s_addr ?: iph->daddr;
|
|
|
|
|
|
|
|
skb_dst_drop(skb);
|
|
|
|
|
|
|
|
err = ip_route_input(skb, nhaddr, iph->saddr, 0, skb->dev);
|
2021-08-17 16:39:37 +08:00
|
|
|
if (err) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-08-25 15:58:17 +08:00
|
|
|
|
|
|
|
return dst_input(skb);
|
2021-08-17 16:39:37 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int input_action_end_dx4(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
if (!decap_and_validate(skb, IPPROTO_IPIP))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb->protocol = htons(ETH_P_IP);
|
|
|
|
skb_set_transport_header(skb, sizeof(struct iphdr));
|
|
|
|
nf_reset_ct(skb);
|
|
|
|
|
|
|
|
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
|
|
|
|
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
|
2024-06-13 17:42:46 +08:00
|
|
|
dev_net(skb->dev), NULL, skb, skb->dev,
|
|
|
|
NULL, input_action_end_dx4_finish);
|
2017-08-25 15:58:17 +08:00
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
return input_action_end_dx4_finish(dev_net(skb->dev), NULL, skb);
|
2017-08-25 15:58:17 +08:00
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2020-12-02 21:05:14 +08:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
static struct net *fib6_config_get_net(const struct fib6_config *fib6_cfg)
|
|
|
|
{
|
|
|
|
const struct nl_info *nli = &fib6_cfg->fc_nlinfo;
|
|
|
|
|
|
|
|
return nli->nl_net;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __seg6_end_dt_vrf_build(struct seg6_local_lwt *slwt, const void *cfg,
|
|
|
|
u16 family, struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct seg6_end_dt_info *info = &slwt->dt_info;
|
|
|
|
int vrf_ifindex;
|
|
|
|
struct net *net;
|
|
|
|
|
|
|
|
net = fib6_config_get_net(cfg);
|
|
|
|
|
|
|
|
/* note that vrf_table was already set by parse_nla_vrftable() */
|
|
|
|
vrf_ifindex = l3mdev_ifindex_lookup_by_table_id(L3MDEV_TYPE_VRF, net,
|
|
|
|
info->vrf_table);
|
|
|
|
if (vrf_ifindex < 0) {
|
|
|
|
if (vrf_ifindex == -EPERM) {
|
|
|
|
NL_SET_ERR_MSG(extack,
|
|
|
|
"Strict mode for VRF is disabled");
|
|
|
|
} else if (vrf_ifindex == -ENODEV) {
|
|
|
|
NL_SET_ERR_MSG(extack,
|
|
|
|
"Table has no associated VRF device");
|
|
|
|
} else {
|
|
|
|
pr_debug("seg6local: SRv6 End.DT* creation error=%d\n",
|
|
|
|
vrf_ifindex);
|
|
|
|
}
|
|
|
|
|
|
|
|
return vrf_ifindex;
|
|
|
|
}
|
|
|
|
|
|
|
|
info->net = net;
|
|
|
|
info->vrf_ifindex = vrf_ifindex;
|
|
|
|
|
|
|
|
info->family = family;
|
|
|
|
info->mode = DT_VRF_MODE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* The SRv6 End.DT4/DT6 behavior extracts the inner (IPv4/IPv6) packet and
|
|
|
|
* routes the IPv4/IPv6 packet by looking at the configured routing table.
|
|
|
|
*
|
|
|
|
* In the SRv6 End.DT4/DT6 use case, we can receive traffic (IPv6+Segment
|
|
|
|
* Routing Header packets) from several interfaces and the outer IPv6
|
|
|
|
* destination address (DA) is used for retrieving the specific instance of the
|
|
|
|
* End.DT4/DT6 behavior that should process the packets.
|
|
|
|
*
|
|
|
|
* However, the inner IPv4/IPv6 packet is not really bound to any receiving
|
|
|
|
* interface and thus the End.DT4/DT6 sets the VRF (associated with the
|
|
|
|
* corresponding routing table) as the *receiving* interface.
|
|
|
|
* In other words, the End.DT4/DT6 processes a packet as if it has been received
|
|
|
|
* directly by the VRF (and not by one of its slave devices, if any).
|
|
|
|
* In this way, the VRF interface is used for routing the IPv4/IPv6 packet in
|
|
|
|
* according to the routing table configured by the End.DT4/DT6 instance.
|
|
|
|
*
|
|
|
|
* This design allows you to get some interesting features like:
|
|
|
|
* 1) the statistics on rx packets;
|
|
|
|
* 2) the possibility to install a packet sniffer on the receiving interface
|
|
|
|
* (the VRF one) for looking at the incoming packets;
|
|
|
|
* 3) the possibility to leverage the netfilter prerouting hook for the inner
|
|
|
|
* IPv4 packet.
|
|
|
|
*
|
|
|
|
* This function returns:
|
|
|
|
* - the sk_buff* when the VRF rcv handler has processed the packet correctly;
|
|
|
|
* - NULL when the skb is consumed by the VRF rcv handler;
|
|
|
|
* - a pointer which encodes a negative error number in case of error.
|
|
|
|
* Note that in this case, the function takes care of freeing the skb.
|
|
|
|
*/
|
|
|
|
static struct sk_buff *end_dt_vrf_rcv(struct sk_buff *skb, u16 family,
|
|
|
|
struct net_device *dev)
|
|
|
|
{
|
|
|
|
/* based on l3mdev_ip_rcv; we are only interested in the master */
|
|
|
|
if (unlikely(!netif_is_l3_master(dev) && !netif_has_l3_rx_handler(dev)))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
if (unlikely(!dev->l3mdev_ops->l3mdev_l3_rcv))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
/* the decap packet IPv4/IPv6 does not come with any mac header info.
|
|
|
|
* We must unset the mac header to allow the VRF device to rebuild it,
|
|
|
|
* just in case there is a sniffer attached on the device.
|
|
|
|
*/
|
|
|
|
skb_unset_mac_header(skb);
|
|
|
|
|
|
|
|
skb = dev->l3mdev_ops->l3mdev_l3_rcv(dev, skb, family);
|
|
|
|
if (!skb)
|
|
|
|
/* the skb buffer was consumed by the handler */
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* when a packet is received by a VRF or by one of its slaves, the
|
|
|
|
* master device reference is set into the skb.
|
|
|
|
*/
|
|
|
|
if (unlikely(skb->dev != dev || skb->skb_iif != dev->ifindex))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
return skb;
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct net_device *end_dt_get_vrf_rcu(struct sk_buff *skb,
|
|
|
|
struct seg6_end_dt_info *info)
|
|
|
|
{
|
|
|
|
int vrf_ifindex = info->vrf_ifindex;
|
|
|
|
struct net *net = info->net;
|
|
|
|
|
|
|
|
if (unlikely(vrf_ifindex < 0))
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
if (unlikely(!net_eq(dev_net(skb->dev), net)))
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
return dev_get_by_index_rcu(net, vrf_ifindex);
|
|
|
|
|
|
|
|
error:
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct sk_buff *end_dt_vrf_core(struct sk_buff *skb,
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
struct seg6_local_lwt *slwt, u16 family)
|
2020-12-02 21:05:14 +08:00
|
|
|
{
|
|
|
|
struct seg6_end_dt_info *info = &slwt->dt_info;
|
|
|
|
struct net_device *vrf;
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
__be16 protocol;
|
|
|
|
int hdrlen;
|
2020-12-02 21:05:14 +08:00
|
|
|
|
|
|
|
vrf = end_dt_get_vrf_rcu(skb, info);
|
|
|
|
if (unlikely(!vrf))
|
|
|
|
goto drop;
|
|
|
|
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
switch (family) {
|
|
|
|
case AF_INET:
|
|
|
|
protocol = htons(ETH_P_IP);
|
|
|
|
hdrlen = sizeof(struct iphdr);
|
|
|
|
break;
|
|
|
|
case AF_INET6:
|
|
|
|
protocol = htons(ETH_P_IPV6);
|
|
|
|
hdrlen = sizeof(struct ipv6hdr);
|
|
|
|
break;
|
|
|
|
case AF_UNSPEC:
|
|
|
|
fallthrough;
|
|
|
|
default:
|
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (unlikely(info->family != AF_UNSPEC && info->family != family)) {
|
|
|
|
pr_warn_once("seg6local: SRv6 End.DT* family mismatch");
|
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
|
|
|
skb->protocol = protocol;
|
2020-12-02 21:05:14 +08:00
|
|
|
|
|
|
|
skb_dst_drop(skb);
|
|
|
|
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
skb_set_transport_header(skb, hdrlen);
|
2021-08-17 16:39:37 +08:00
|
|
|
nf_reset_ct(skb);
|
2020-12-02 21:05:14 +08:00
|
|
|
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
return end_dt_vrf_rcv(skb, family, vrf);
|
2020-12-02 21:05:14 +08:00
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int input_action_end_dt4(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct iphdr *iph;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!decap_and_validate(skb, IPPROTO_IPIP))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
|
|
|
|
goto drop;
|
|
|
|
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
skb = end_dt_vrf_core(skb, slwt, AF_INET);
|
2020-12-02 21:05:14 +08:00
|
|
|
if (!skb)
|
|
|
|
/* packet has been processed and consumed by the VRF */
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (IS_ERR(skb))
|
|
|
|
return PTR_ERR(skb);
|
|
|
|
|
|
|
|
iph = ip_hdr(skb);
|
|
|
|
|
|
|
|
err = ip_route_input(skb, iph->daddr, iph->saddr, 0, skb->dev);
|
|
|
|
if (unlikely(err))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_end_dt4_build(struct seg6_local_lwt *slwt, const void *cfg,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
return __seg6_end_dt_vrf_build(slwt, cfg, AF_INET, extack);
|
|
|
|
}
|
2020-12-02 21:05:15 +08:00
|
|
|
|
|
|
|
static enum
|
|
|
|
seg6_end_dt_mode seg6_end_dt6_parse_mode(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
unsigned long parsed_optattrs = slwt->parsed_optattrs;
|
|
|
|
bool legacy, vrfmode;
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
legacy = !!(parsed_optattrs & SEG6_F_ATTR(SEG6_LOCAL_TABLE));
|
|
|
|
vrfmode = !!(parsed_optattrs & SEG6_F_ATTR(SEG6_LOCAL_VRFTABLE));
|
2020-12-02 21:05:15 +08:00
|
|
|
|
|
|
|
if (!(legacy ^ vrfmode))
|
|
|
|
/* both are absent or present: invalid DT6 mode */
|
|
|
|
return DT_INVALID_MODE;
|
|
|
|
|
|
|
|
return legacy ? DT_LEGACY_MODE : DT_VRF_MODE;
|
|
|
|
}
|
|
|
|
|
|
|
|
static enum seg6_end_dt_mode seg6_end_dt6_get_mode(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct seg6_end_dt_info *info = &slwt->dt_info;
|
|
|
|
|
|
|
|
return info->mode;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_end_dt6_build(struct seg6_local_lwt *slwt, const void *cfg,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
enum seg6_end_dt_mode mode = seg6_end_dt6_parse_mode(slwt);
|
|
|
|
struct seg6_end_dt_info *info = &slwt->dt_info;
|
|
|
|
|
|
|
|
switch (mode) {
|
|
|
|
case DT_LEGACY_MODE:
|
|
|
|
info->mode = DT_LEGACY_MODE;
|
|
|
|
return 0;
|
|
|
|
case DT_VRF_MODE:
|
|
|
|
return __seg6_end_dt_vrf_build(slwt, cfg, AF_INET6, extack);
|
|
|
|
default:
|
|
|
|
NL_SET_ERR_MSG(extack, "table or vrftable must be specified");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
2020-12-02 21:05:14 +08:00
|
|
|
#endif
|
|
|
|
|
2017-08-25 15:58:17 +08:00
|
|
|
static int input_action_end_dt6(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
if (!decap_and_validate(skb, IPPROTO_IPV6))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
|
|
|
|
goto drop;
|
|
|
|
|
2020-12-02 21:05:15 +08:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
if (seg6_end_dt6_get_mode(slwt) == DT_LEGACY_MODE)
|
|
|
|
goto legacy_mode;
|
|
|
|
|
|
|
|
/* DT6_VRF_MODE */
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
skb = end_dt_vrf_core(skb, slwt, AF_INET6);
|
2020-12-02 21:05:15 +08:00
|
|
|
if (!skb)
|
|
|
|
/* packet has been processed and consumed by the VRF */
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (IS_ERR(skb))
|
|
|
|
return PTR_ERR(skb);
|
|
|
|
|
|
|
|
/* note: this time we do not need to specify the table because the VRF
|
|
|
|
* takes care of selecting the correct table.
|
|
|
|
*/
|
|
|
|
seg6_lookup_any_nexthop(skb, NULL, 0, true);
|
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
legacy_mode:
|
|
|
|
#endif
|
2019-11-16 23:05:53 +08:00
|
|
|
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
|
|
|
|
|
2019-11-23 00:22:42 +08:00
|
|
|
seg6_lookup_any_nexthop(skb, NULL, slwt->table, true);
|
2017-08-25 15:58:17 +08:00
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
static int seg6_end_dt46_build(struct seg6_local_lwt *slwt, const void *cfg,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
return __seg6_end_dt_vrf_build(slwt, cfg, AF_UNSPEC, extack);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int input_action_end_dt46(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
unsigned int off = 0;
|
|
|
|
int nexthdr;
|
|
|
|
|
|
|
|
nexthdr = ipv6_find_hdr(skb, &off, -1, NULL, NULL);
|
|
|
|
if (unlikely(nexthdr < 0))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
switch (nexthdr) {
|
|
|
|
case IPPROTO_IPIP:
|
|
|
|
return input_action_end_dt4(skb, slwt);
|
|
|
|
case IPPROTO_IPV6:
|
|
|
|
return input_action_end_dt6(skb, slwt);
|
|
|
|
}
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-08-05 18:39:48 +08:00
|
|
|
/* push an SRH on top of the current one */
|
|
|
|
static int input_action_end_b6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
int err = -EINVAL;
|
|
|
|
|
|
|
|
srh = get_and_validate_srh(skb);
|
|
|
|
if (!srh)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
err = seg6_do_srh_inline(skb, slwt->srh);
|
|
|
|
if (err)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
|
|
|
|
|
2018-05-20 21:58:13 +08:00
|
|
|
seg6_lookup_nexthop(skb, NULL, 0);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* encapsulate within an outer IPv6 header and a specified SRH */
|
|
|
|
static int input_action_end_b6_encap(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
int err = -EINVAL;
|
|
|
|
|
|
|
|
srh = get_and_validate_srh(skb);
|
|
|
|
if (!srh)
|
|
|
|
goto drop;
|
|
|
|
|
2017-08-25 15:56:47 +08:00
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
|
|
|
skb_reset_inner_headers(skb);
|
|
|
|
skb->encapsulation = 1;
|
|
|
|
|
2017-08-25 15:56:44 +08:00
|
|
|
err = seg6_do_srh_encap(skb, slwt->srh, IPPROTO_IPV6);
|
2017-08-05 18:39:48 +08:00
|
|
|
if (err)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
|
|
|
|
|
2018-05-20 21:58:13 +08:00
|
|
|
seg6_lookup_nexthop(skb, NULL, 0);
|
2017-08-05 18:39:48 +08:00
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
drop:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2024-06-20 21:22:02 +08:00
|
|
|
DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states) = {
|
|
|
|
.bh_lock = INIT_LOCAL_LOCK(bh_lock),
|
|
|
|
};
|
bpf: Add IPv6 Segment Routing helpers
The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
- bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
- bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
(to add/delete TLVs)
- bpf_lwt_seg6_action: Apply some SRv6 network programming actions
(specifically End.X, End.T, End.B6 and
End.B6.Encap)
The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).
The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.
Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.
Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.
Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.
These helpers require CONFIG_IPV6=y (and not =m).
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:14 +08:00
|
|
|
|
2018-07-26 10:10:40 +08:00
|
|
|
bool seg6_bpf_has_valid_srh(struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct seg6_bpf_srh_state *srh_state =
|
|
|
|
this_cpu_ptr(&seg6_bpf_srh_states);
|
|
|
|
struct ipv6_sr_hdr *srh = srh_state->srh;
|
|
|
|
|
2024-06-20 21:22:02 +08:00
|
|
|
lockdep_assert_held(&srh_state->bh_lock);
|
2018-07-26 10:10:40 +08:00
|
|
|
if (unlikely(srh == NULL))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (unlikely(!srh_state->valid)) {
|
|
|
|
if ((srh_state->hdrlen & 7) != 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
|
2020-06-03 14:54:42 +08:00
|
|
|
if (!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3, true))
|
2018-07-26 10:10:40 +08:00
|
|
|
return false;
|
|
|
|
|
|
|
|
srh_state->valid = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
static int input_action_end_bpf(struct sk_buff *skb,
|
|
|
|
struct seg6_local_lwt *slwt)
|
|
|
|
{
|
2024-06-20 21:22:02 +08:00
|
|
|
struct seg6_bpf_srh_state *srh_state;
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
srh = get_and_validate_srh(skb);
|
2018-07-26 10:10:40 +08:00
|
|
|
if (!srh) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
|
|
|
|
|
2024-06-20 21:22:02 +08:00
|
|
|
/* The access to the per-CPU buffer srh_state is protected by running
|
|
|
|
* always in softirq context (with disabled BH). On PREEMPT_RT the
|
|
|
|
* required locking is provided by the following local_lock_nested_bh()
|
|
|
|
* statement. It is also accessed by the bpf_lwt_seg6_* helpers via
|
|
|
|
* bpf_prog_run_save_cb().
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
*/
|
2024-06-20 21:22:02 +08:00
|
|
|
local_lock_nested_bh(&seg6_bpf_srh_states.bh_lock);
|
|
|
|
srh_state = this_cpu_ptr(&seg6_bpf_srh_states);
|
2018-07-26 10:10:40 +08:00
|
|
|
srh_state->srh = srh;
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
srh_state->hdrlen = srh->hdrlen << 3;
|
2018-07-26 10:10:40 +08:00
|
|
|
srh_state->valid = true;
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
bpf_compute_data_pointers(skb);
|
|
|
|
ret = bpf_prog_run_save_cb(slwt->bpf.prog, skb);
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
switch (ret) {
|
|
|
|
case BPF_OK:
|
|
|
|
case BPF_REDIRECT:
|
|
|
|
break;
|
|
|
|
case BPF_DROP:
|
|
|
|
goto drop;
|
|
|
|
default:
|
|
|
|
pr_warn_once("bpf-seg6local: Illegal return value %u\n", ret);
|
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
2018-07-26 10:10:40 +08:00
|
|
|
if (srh_state->srh && !seg6_bpf_has_valid_srh(skb))
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
goto drop;
|
2024-06-20 21:22:02 +08:00
|
|
|
local_unlock_nested_bh(&seg6_bpf_srh_states.bh_lock);
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
|
|
|
|
if (ret != BPF_REDIRECT)
|
|
|
|
seg6_lookup_nexthop(skb, NULL, 0);
|
|
|
|
|
|
|
|
return dst_input(skb);
|
|
|
|
|
|
|
|
drop:
|
2024-06-20 21:22:02 +08:00
|
|
|
local_unlock_nested_bh(&seg6_bpf_srh_states.bh_lock);
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
static struct seg6_action_desc seg6_action_table[] = {
|
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END,
|
|
|
|
.attrs = 0,
|
2022-09-13 01:16:18 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS |
|
2023-08-13 02:09:25 +08:00
|
|
|
SEG6_F_LOCAL_FLAVORS,
|
2017-08-05 18:39:48 +08:00
|
|
|
.input = input_action_end,
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_X,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_NH6),
|
2023-08-13 02:09:25 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS |
|
|
|
|
SEG6_F_LOCAL_FLAVORS,
|
2017-08-05 18:39:48 +08:00
|
|
|
.input = input_action_end_x,
|
2017-08-05 18:38:26 +08:00
|
|
|
},
|
2017-08-25 15:58:17 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_T,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_TABLE),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2017-08-25 15:58:17 +08:00
|
|
|
.input = input_action_end_t,
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_DX2,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_OIF),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2017-08-25 15:58:17 +08:00
|
|
|
.input = input_action_end_dx2,
|
|
|
|
},
|
2017-08-05 18:39:48 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_DX6,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_NH6),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2017-08-05 18:39:48 +08:00
|
|
|
.input = input_action_end_dx6,
|
|
|
|
},
|
2017-08-25 15:58:17 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_DX4,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_NH4),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2017-08-25 15:58:17 +08:00
|
|
|
.input = input_action_end_dx4,
|
|
|
|
},
|
2020-12-02 21:05:14 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_DT4,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_VRFTABLE),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2020-12-02 21:05:14 +08:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
.input = input_action_end_dt4,
|
|
|
|
.slwt_ops = {
|
|
|
|
.build_state = seg6_end_dt4_build,
|
|
|
|
},
|
|
|
|
#endif
|
|
|
|
},
|
2017-08-25 15:58:17 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_DT6,
|
2020-12-02 21:05:15 +08:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
.attrs = 0,
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS |
|
|
|
|
SEG6_F_ATTR(SEG6_LOCAL_TABLE) |
|
2021-02-07 01:09:34 +08:00
|
|
|
SEG6_F_ATTR(SEG6_LOCAL_VRFTABLE),
|
2020-12-02 21:05:15 +08:00
|
|
|
.slwt_ops = {
|
|
|
|
.build_state = seg6_end_dt6_build,
|
|
|
|
},
|
|
|
|
#else
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_TABLE),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2020-12-02 21:05:15 +08:00
|
|
|
#endif
|
2017-08-25 15:58:17 +08:00
|
|
|
.input = input_action_end_dt6,
|
|
|
|
},
|
seg6: add support for SRv6 End.DT46 Behavior
IETF RFC 8986 [1] includes the definition of SRv6 End.DT4, End.DT6, and
End.DT46 Behaviors.
The current SRv6 code in the Linux kernel only implements End.DT4 and
End.DT6 which can be used respectively to support IPv4-in-IPv6 and
IPv6-in-IPv6 VPNs. With End.DT4 and End.DT6 it is not possible to create a
single SRv6 VPN tunnel to carry both IPv4 and IPv6 traffic.
The proposed End.DT46 implementation is meant to support the decapsulation
of IPv4 and IPv6 traffic coming from a single SRv6 tunnel.
The implementation of the SRv6 End.DT46 Behavior in the Linux kernel
greatly simplifies the setup and operations of SRv6 VPNs.
The SRv6 End.DT46 Behavior leverages the infrastructure of SRv6 End.DT{4,6}
Behaviors implemented so far, because it makes use of a VRF device in
order to force the routing lookup into the associated routing table.
To make the End.DT46 work properly, it must be guaranteed that the routing
table used for routing lookup operations is bound to one and only one VRF
during the tunnel creation. Such constraint has to be enforced by enabling
the VRF strict_mode sysctl parameter, i.e.:
$ sysctl -wq net.vrf.strict_mode=1
Note that the same approach is used for the SRv6 End.DT4 Behavior and for
the End.DT6 Behavior in VRF mode.
The command used to instantiate an SRv6 End.DT46 Behavior is
straightforward, i.e.:
$ ip -6 route add 2001:db8::1 encap seg6local action End.DT46 vrftable 100 dev vrf100.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performance and impact of SRv6 End.DT46 Behavior on the SRv6 Networking
=======================================================================
This patch aims to add the SRv6 End.DT46 Behavior with minimal impact on
the performance of SRv6 End.DT4 and End.DT6 Behaviors.
In order to verify this, we tested the performance of the newly introduced
SRv6 End.DT46 Behavior and compared it with the performance of SRv6
End.DT{4,6} Behaviors, considering both the patched kernel and the kernel
before applying the End.DT46 patch (referred to as vanilla kernel).
In details, the following decapsulation scenarios were considered:
1.a) IPv6 traffic in SRv6 End.DT46 Behavior on patched kernel;
1.b) IPv4 traffic in SRv6 End.DT46 Behavior on patched kernel;
2.a) SRv6 End.DT6 Behavior (VRF mode) on patched kernel;
2.b) SRv6 End.DT4 Behavior on patched kernel;
3.a) SRv6 End.DT6 Behavior (VRF mode) on vanilla kernel (without the
End.DT46 patch);
3.b) SRv6 End.DT4 Behavior on vanilla kernel (without the End.DT46 patch).
All tests were performed on a testbed deployed on the CloudLab [2]
facilities. We considered IPv{4,6} traffic handled by a single core (at 2.4
GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.13-rc1 using packets of size
~ 100 bytes.
Scenario (1.a): average 684.70 kpps; std. dev. 0.7 kpps;
Scenario (1.b): average 711.69 kpps; std. dev. 1.2 kpps;
Scenario (2.a): average 690.70 kpps; std. dev. 1.2 kpps;
Scenario (2.b): average 722.22 kpps; std. dev. 1.7 kpps;
Scenario (3.a): average 690.02 kpps; std. dev. 2.6 kpps;
Scenario (3.b): average 721.91 kpps; std. dev. 1.2 kpps;
Considering the results for the patched kernel (1.a, 1.b, 2.a, 2.b) we
observe that the performance degradation incurred in using End.DT46 rather
than End.DT6 and End.DT4 respectively for IPv6 and IPv4 traffic is minimal,
around 0.9% and 1.5%. Such very minimal performance degradation is the
price to be paid if one prefers to use a single tunnel capable of handling
both types of traffic (IPv4 and IPv6).
Comparing the results for End.DT4 and End.DT6 under the patched and the
vanilla kernel (2.a, 2.b, 3.a, 3.b) we observe that the introduction of the
End.DT46 patch has no impact on the performance of End.DT4 and End.DT6.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 01:16:44 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_DT46,
|
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_VRFTABLE),
|
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
.input = input_action_end_dt46,
|
|
|
|
.slwt_ops = {
|
|
|
|
.build_state = seg6_end_dt46_build,
|
|
|
|
},
|
|
|
|
#endif
|
|
|
|
},
|
2017-08-05 18:39:48 +08:00
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_B6,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_SRH),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2017-08-05 18:39:48 +08:00
|
|
|
.input = input_action_end_b6,
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_B6_ENCAP,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_SRH),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
2017-08-05 18:39:48 +08:00
|
|
|
.input = input_action_end_b6_encap,
|
|
|
|
.static_headroom = sizeof(struct ipv6hdr),
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
},
|
|
|
|
{
|
|
|
|
.action = SEG6_LOCAL_ACTION_END_BPF,
|
2021-02-07 01:09:34 +08:00
|
|
|
.attrs = SEG6_F_ATTR(SEG6_LOCAL_BPF),
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
.optattrs = SEG6_F_LOCAL_COUNTERS,
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
.input = input_action_end_bpf,
|
|
|
|
},
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct seg6_action_desc *__get_action_desc(int action)
|
|
|
|
{
|
|
|
|
struct seg6_action_desc *desc;
|
|
|
|
int i, count;
|
|
|
|
|
2018-01-08 07:50:26 +08:00
|
|
|
count = ARRAY_SIZE(seg6_action_table);
|
2017-08-05 18:38:26 +08:00
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
desc = &seg6_action_table[i];
|
|
|
|
if (desc->action == action)
|
|
|
|
return desc;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
static bool seg6_lwtunnel_counters_enabled(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
return slwt->parsed_optattrs & SEG6_F_LOCAL_COUNTERS;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void seg6_local_update_counters(struct seg6_local_lwt *slwt,
|
|
|
|
unsigned int len, int err)
|
|
|
|
{
|
|
|
|
struct pcpu_seg6_local_counters *pcounters;
|
|
|
|
|
|
|
|
pcounters = this_cpu_ptr(slwt->pcpu_counters);
|
|
|
|
u64_stats_update_begin(&pcounters->syncp);
|
|
|
|
|
|
|
|
if (likely(!err)) {
|
|
|
|
u64_stats_inc(&pcounters->packets);
|
|
|
|
u64_stats_add(&pcounters->bytes, len);
|
|
|
|
} else {
|
|
|
|
u64_stats_inc(&pcounters->errors);
|
|
|
|
}
|
|
|
|
|
|
|
|
u64_stats_update_end(&pcounters->syncp);
|
|
|
|
}
|
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
static int seg6_local_input_core(struct net *net, struct sock *sk,
|
|
|
|
struct sk_buff *skb)
|
2017-08-05 18:38:26 +08:00
|
|
|
{
|
|
|
|
struct dst_entry *orig_dst = skb_dst(skb);
|
|
|
|
struct seg6_action_desc *desc;
|
|
|
|
struct seg6_local_lwt *slwt;
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
unsigned int len = skb->len;
|
|
|
|
int rc;
|
2017-08-05 18:38:26 +08:00
|
|
|
|
|
|
|
slwt = seg6_local_lwtunnel(orig_dst->lwtstate);
|
|
|
|
desc = slwt->desc;
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
rc = desc->input(skb, slwt);
|
|
|
|
|
|
|
|
if (!seg6_lwtunnel_counters_enabled(slwt))
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
seg6_local_update_counters(slwt, len, rc);
|
|
|
|
|
|
|
|
return rc;
|
2017-08-05 18:38:26 +08:00
|
|
|
}
|
|
|
|
|
2021-08-17 16:39:37 +08:00
|
|
|
static int seg6_local_input(struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
if (skb->protocol != htons(ETH_P_IPV6)) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
|
|
|
|
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
|
|
|
|
dev_net(skb->dev), NULL, skb, skb->dev, NULL,
|
|
|
|
seg6_local_input_core);
|
|
|
|
|
|
|
|
return seg6_local_input_core(dev_net(skb->dev), NULL, skb);
|
|
|
|
}
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
static const struct nla_policy seg6_local_policy[SEG6_LOCAL_MAX + 1] = {
|
|
|
|
[SEG6_LOCAL_ACTION] = { .type = NLA_U32 },
|
|
|
|
[SEG6_LOCAL_SRH] = { .type = NLA_BINARY },
|
|
|
|
[SEG6_LOCAL_TABLE] = { .type = NLA_U32 },
|
2020-12-02 21:05:14 +08:00
|
|
|
[SEG6_LOCAL_VRFTABLE] = { .type = NLA_U32 },
|
2017-08-05 18:38:26 +08:00
|
|
|
[SEG6_LOCAL_NH4] = { .type = NLA_BINARY,
|
|
|
|
.len = sizeof(struct in_addr) },
|
|
|
|
[SEG6_LOCAL_NH6] = { .type = NLA_BINARY,
|
|
|
|
.len = sizeof(struct in6_addr) },
|
|
|
|
[SEG6_LOCAL_IIF] = { .type = NLA_U32 },
|
|
|
|
[SEG6_LOCAL_OIF] = { .type = NLA_U32 },
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
[SEG6_LOCAL_BPF] = { .type = NLA_NESTED },
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
[SEG6_LOCAL_COUNTERS] = { .type = NLA_NESTED },
|
2022-09-13 01:16:18 +08:00
|
|
|
[SEG6_LOCAL_FLAVORS] = { .type = NLA_NESTED },
|
2017-08-05 18:38:26 +08:00
|
|
|
};
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_srh(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:27 +08:00
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
int len;
|
|
|
|
|
|
|
|
srh = nla_data(attrs[SEG6_LOCAL_SRH]);
|
|
|
|
len = nla_len(attrs[SEG6_LOCAL_SRH]);
|
|
|
|
|
|
|
|
/* SRH must contain at least one segment */
|
|
|
|
if (len < sizeof(*srh) + sizeof(struct in6_addr))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2020-06-03 14:54:42 +08:00
|
|
|
if (!seg6_validate_srh(srh, len, false))
|
2017-08-05 18:38:27 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2018-07-23 16:33:19 +08:00
|
|
|
slwt->srh = kmemdup(srh, len, GFP_KERNEL);
|
2017-08-05 18:38:27 +08:00
|
|
|
if (!slwt->srh)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
slwt->headroom += len;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_srh(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct ipv6_sr_hdr *srh;
|
|
|
|
struct nlattr *nla;
|
|
|
|
int len;
|
|
|
|
|
|
|
|
srh = slwt->srh;
|
|
|
|
len = (srh->hdrlen + 1) << 3;
|
|
|
|
|
|
|
|
nla = nla_reserve(skb, SEG6_LOCAL_SRH, len);
|
|
|
|
if (!nla)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
memcpy(nla_data(nla), srh, len);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_srh(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
int len = (a->srh->hdrlen + 1) << 3;
|
|
|
|
|
|
|
|
if (len != ((b->srh->hdrlen + 1) << 3))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return memcmp(a->srh, b->srh, len);
|
|
|
|
}
|
|
|
|
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
static void destroy_attr_srh(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
kfree(slwt->srh);
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_table(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:27 +08:00
|
|
|
{
|
|
|
|
slwt->table = nla_get_u32(attrs[SEG6_LOCAL_TABLE]);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_table(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_TABLE, slwt->table))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_table(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
if (a->table != b->table)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-12-02 21:05:14 +08:00
|
|
|
static struct
|
|
|
|
seg6_end_dt_info *seg6_possible_end_dt_info(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
|
|
|
return &slwt->dt_info;
|
|
|
|
#else
|
|
|
|
return ERR_PTR(-EOPNOTSUPP);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static int parse_nla_vrftable(struct nlattr **attrs,
|
2022-09-13 01:16:17 +08:00
|
|
|
struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2020-12-02 21:05:14 +08:00
|
|
|
{
|
|
|
|
struct seg6_end_dt_info *info = seg6_possible_end_dt_info(slwt);
|
|
|
|
|
|
|
|
if (IS_ERR(info))
|
|
|
|
return PTR_ERR(info);
|
|
|
|
|
|
|
|
info->vrf_table = nla_get_u32(attrs[SEG6_LOCAL_VRFTABLE]);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_vrftable(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct seg6_end_dt_info *info = seg6_possible_end_dt_info(slwt);
|
|
|
|
|
|
|
|
if (IS_ERR(info))
|
|
|
|
return PTR_ERR(info);
|
|
|
|
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_VRFTABLE, info->vrf_table))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_vrftable(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
struct seg6_end_dt_info *info_a = seg6_possible_end_dt_info(a);
|
|
|
|
struct seg6_end_dt_info *info_b = seg6_possible_end_dt_info(b);
|
|
|
|
|
|
|
|
if (info_a->vrf_table != info_b->vrf_table)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_nh4(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:27 +08:00
|
|
|
{
|
|
|
|
memcpy(&slwt->nh4, nla_data(attrs[SEG6_LOCAL_NH4]),
|
|
|
|
sizeof(struct in_addr));
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_nh4(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct nlattr *nla;
|
|
|
|
|
|
|
|
nla = nla_reserve(skb, SEG6_LOCAL_NH4, sizeof(struct in_addr));
|
|
|
|
if (!nla)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
memcpy(nla_data(nla), &slwt->nh4, sizeof(struct in_addr));
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_nh4(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
return memcmp(&a->nh4, &b->nh4, sizeof(struct in_addr));
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_nh6(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:27 +08:00
|
|
|
{
|
|
|
|
memcpy(&slwt->nh6, nla_data(attrs[SEG6_LOCAL_NH6]),
|
|
|
|
sizeof(struct in6_addr));
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_nh6(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct nlattr *nla;
|
|
|
|
|
|
|
|
nla = nla_reserve(skb, SEG6_LOCAL_NH6, sizeof(struct in6_addr));
|
|
|
|
if (!nla)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
memcpy(nla_data(nla), &slwt->nh6, sizeof(struct in6_addr));
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_nh6(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
return memcmp(&a->nh6, &b->nh6, sizeof(struct in6_addr));
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_iif(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:27 +08:00
|
|
|
{
|
|
|
|
slwt->iif = nla_get_u32(attrs[SEG6_LOCAL_IIF]);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_iif(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_IIF, slwt->iif))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_iif(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
if (a->iif != b->iif)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_oif(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:27 +08:00
|
|
|
{
|
|
|
|
slwt->oif = nla_get_u32(attrs[SEG6_LOCAL_OIF]);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_oif(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_OIF, slwt->oif))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_oif(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
if (a->oif != b->oif)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
#define MAX_PROG_NAME 256
|
|
|
|
static const struct nla_policy bpf_prog_policy[SEG6_LOCAL_BPF_PROG_MAX + 1] = {
|
|
|
|
[SEG6_LOCAL_BPF_PROG] = { .type = NLA_U32, },
|
|
|
|
[SEG6_LOCAL_BPF_PROG_NAME] = { .type = NLA_NUL_STRING,
|
|
|
|
.len = MAX_PROG_NAME },
|
|
|
|
};
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_bpf(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
{
|
|
|
|
struct nlattr *tb[SEG6_LOCAL_BPF_PROG_MAX + 1];
|
|
|
|
struct bpf_prog *p;
|
|
|
|
int ret;
|
|
|
|
u32 fd;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
ret = nla_parse_nested_deprecated(tb, SEG6_LOCAL_BPF_PROG_MAX,
|
|
|
|
attrs[SEG6_LOCAL_BPF],
|
|
|
|
bpf_prog_policy, NULL);
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (!tb[SEG6_LOCAL_BPF_PROG] || !tb[SEG6_LOCAL_BPF_PROG_NAME])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
slwt->bpf.name = nla_memdup(tb[SEG6_LOCAL_BPF_PROG_NAME], GFP_KERNEL);
|
|
|
|
if (!slwt->bpf.name)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
fd = nla_get_u32(tb[SEG6_LOCAL_BPF_PROG]);
|
|
|
|
p = bpf_prog_get_type(fd, BPF_PROG_TYPE_LWT_SEG6LOCAL);
|
|
|
|
if (IS_ERR(p)) {
|
|
|
|
kfree(slwt->bpf.name);
|
|
|
|
return PTR_ERR(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
slwt->bpf.prog = p;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct nlattr *nest;
|
|
|
|
|
|
|
|
if (!slwt->bpf.prog)
|
|
|
|
return 0;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
nest = nla_nest_start_noflag(skb, SEG6_LOCAL_BPF);
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
if (!nest)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_BPF_PROG, slwt->bpf.prog->aux->id))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (slwt->bpf.name &&
|
|
|
|
nla_put_string(skb, SEG6_LOCAL_BPF_PROG_NAME, slwt->bpf.name))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return nla_nest_end(skb, nest);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_bpf(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
if (!a->bpf.name && !b->bpf.name)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!a->bpf.name || !b->bpf.name)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return strcmp(a->bpf.name, b->bpf.name);
|
|
|
|
}
|
|
|
|
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
static void destroy_attr_bpf(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
kfree(slwt->bpf.name);
|
|
|
|
if (slwt->bpf.prog)
|
|
|
|
bpf_prog_put(slwt->bpf.prog);
|
|
|
|
}
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
static const struct
|
|
|
|
nla_policy seg6_local_counters_policy[SEG6_LOCAL_CNT_MAX + 1] = {
|
|
|
|
[SEG6_LOCAL_CNT_PACKETS] = { .type = NLA_U64 },
|
|
|
|
[SEG6_LOCAL_CNT_BYTES] = { .type = NLA_U64 },
|
|
|
|
[SEG6_LOCAL_CNT_ERRORS] = { .type = NLA_U64 },
|
|
|
|
};
|
|
|
|
|
|
|
|
static int parse_nla_counters(struct nlattr **attrs,
|
2022-09-13 01:16:17 +08:00
|
|
|
struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
{
|
|
|
|
struct pcpu_seg6_local_counters __percpu *pcounters;
|
|
|
|
struct nlattr *tb[SEG6_LOCAL_CNT_MAX + 1];
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = nla_parse_nested_deprecated(tb, SEG6_LOCAL_CNT_MAX,
|
|
|
|
attrs[SEG6_LOCAL_COUNTERS],
|
|
|
|
seg6_local_counters_policy, NULL);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/* basic support for SRv6 Behavior counters requires at least:
|
|
|
|
* packets, bytes and errors.
|
|
|
|
*/
|
|
|
|
if (!tb[SEG6_LOCAL_CNT_PACKETS] || !tb[SEG6_LOCAL_CNT_BYTES] ||
|
|
|
|
!tb[SEG6_LOCAL_CNT_ERRORS])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* counters are always zero initialized */
|
|
|
|
pcounters = seg6_local_alloc_pcpu_counters(GFP_KERNEL);
|
|
|
|
if (!pcounters)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
slwt->pcpu_counters = pcounters;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_local_fill_nla_counters(struct sk_buff *skb,
|
|
|
|
struct seg6_local_counters *counters)
|
|
|
|
{
|
|
|
|
if (nla_put_u64_64bit(skb, SEG6_LOCAL_CNT_PACKETS, counters->packets,
|
|
|
|
SEG6_LOCAL_CNT_PAD))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (nla_put_u64_64bit(skb, SEG6_LOCAL_CNT_BYTES, counters->bytes,
|
|
|
|
SEG6_LOCAL_CNT_PAD))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (nla_put_u64_64bit(skb, SEG6_LOCAL_CNT_ERRORS, counters->errors,
|
|
|
|
SEG6_LOCAL_CNT_PAD))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_counters(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct seg6_local_counters counters = { 0, 0, 0 };
|
|
|
|
struct nlattr *nest;
|
|
|
|
int rc, i;
|
|
|
|
|
|
|
|
nest = nla_nest_start(skb, SEG6_LOCAL_COUNTERS);
|
|
|
|
if (!nest)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
for_each_possible_cpu(i) {
|
|
|
|
struct pcpu_seg6_local_counters *pcounters;
|
|
|
|
u64 packets, bytes, errors;
|
|
|
|
unsigned int start;
|
|
|
|
|
|
|
|
pcounters = per_cpu_ptr(slwt->pcpu_counters, i);
|
|
|
|
do {
|
2022-10-26 21:22:15 +08:00
|
|
|
start = u64_stats_fetch_begin(&pcounters->syncp);
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
|
|
|
|
packets = u64_stats_read(&pcounters->packets);
|
|
|
|
bytes = u64_stats_read(&pcounters->bytes);
|
|
|
|
errors = u64_stats_read(&pcounters->errors);
|
|
|
|
|
2022-10-26 21:22:15 +08:00
|
|
|
} while (u64_stats_fetch_retry(&pcounters->syncp, start));
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
|
|
|
|
counters.packets += packets;
|
|
|
|
counters.bytes += bytes;
|
|
|
|
counters.errors += errors;
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = seg6_local_fill_nla_counters(skb, &counters);
|
|
|
|
if (rc < 0) {
|
|
|
|
nla_nest_cancel(skb, nest);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
return nla_nest_end(skb, nest);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_counters(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
/* a and b are equal if both have pcpu_counters set or not */
|
|
|
|
return (!!((unsigned long)a->pcpu_counters)) ^
|
|
|
|
(!!((unsigned long)b->pcpu_counters));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void destroy_attr_counters(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
free_percpu(slwt->pcpu_counters);
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
static const
|
|
|
|
struct nla_policy seg6_local_flavors_policy[SEG6_LOCAL_FLV_MAX + 1] = {
|
|
|
|
[SEG6_LOCAL_FLV_OPERATION] = { .type = NLA_U32 },
|
|
|
|
[SEG6_LOCAL_FLV_LCBLOCK_BITS] = { .type = NLA_U8 },
|
|
|
|
[SEG6_LOCAL_FLV_LCNODE_FN_BITS] = { .type = NLA_U8 },
|
|
|
|
};
|
|
|
|
|
|
|
|
/* check whether the lengths of the Locator-Block and Locator-Node Function
|
|
|
|
* are compatible with the dimension of a C-SID container.
|
|
|
|
*/
|
|
|
|
static int seg6_chk_next_csid_cfg(__u8 block_len, __u8 func_len)
|
|
|
|
{
|
|
|
|
/* Locator-Block and Locator-Node Function cannot exceed 128 bits
|
|
|
|
* (i.e. C-SID container lenghts).
|
|
|
|
*/
|
|
|
|
if (next_csid_chk_cntr_bits(block_len, func_len))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Locator-Block length must be greater than zero and evenly divisible
|
|
|
|
* by 8. There must be room for a Locator-Node Function, at least.
|
|
|
|
*/
|
|
|
|
if (next_csid_chk_lcblock_bits(block_len))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Locator-Node Function length must be greater than zero and evenly
|
|
|
|
* divisible by 8. There must be room for the Locator-Block.
|
|
|
|
*/
|
|
|
|
if (next_csid_chk_lcnode_fn_bits(func_len))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_parse_nla_next_csid_cfg(struct nlattr **tb,
|
|
|
|
struct seg6_flavors_info *finfo,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
__u8 func_len = SEG6_LOCAL_LCNODE_FN_DBITS;
|
|
|
|
__u8 block_len = SEG6_LOCAL_LCBLOCK_DBITS;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
if (tb[SEG6_LOCAL_FLV_LCBLOCK_BITS])
|
|
|
|
block_len = nla_get_u8(tb[SEG6_LOCAL_FLV_LCBLOCK_BITS]);
|
|
|
|
|
|
|
|
if (tb[SEG6_LOCAL_FLV_LCNODE_FN_BITS])
|
|
|
|
func_len = nla_get_u8(tb[SEG6_LOCAL_FLV_LCNODE_FN_BITS]);
|
|
|
|
|
|
|
|
rc = seg6_chk_next_csid_cfg(block_len, func_len);
|
|
|
|
if (rc < 0) {
|
|
|
|
NL_SET_ERR_MSG(extack,
|
|
|
|
"Invalid Locator Block/Node Function lengths");
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
finfo->lcblock_bits = block_len;
|
|
|
|
finfo->lcnode_func_bits = func_len;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int parse_nla_flavors(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
struct nlattr *tb[SEG6_LOCAL_FLV_MAX + 1];
|
2023-08-13 02:09:25 +08:00
|
|
|
int action = slwt->action;
|
|
|
|
__u32 fops, supp_fops;
|
2022-09-13 01:16:18 +08:00
|
|
|
int rc;
|
|
|
|
|
|
|
|
rc = nla_parse_nested_deprecated(tb, SEG6_LOCAL_FLV_MAX,
|
|
|
|
attrs[SEG6_LOCAL_FLAVORS],
|
|
|
|
seg6_local_flavors_policy, NULL);
|
|
|
|
if (rc < 0)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
/* this attribute MUST always be present since it represents the Flavor
|
|
|
|
* operation(s) to be carried out.
|
|
|
|
*/
|
|
|
|
if (!tb[SEG6_LOCAL_FLV_OPERATION])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
fops = nla_get_u32(tb[SEG6_LOCAL_FLV_OPERATION]);
|
2023-08-13 02:09:25 +08:00
|
|
|
rc = seg6_flv_supp_ops_by_action(action, &supp_fops);
|
|
|
|
if (rc < 0 || (fops & ~supp_fops)) {
|
2022-09-13 01:16:18 +08:00
|
|
|
NL_SET_ERR_MSG(extack, "Unsupported Flavor operation(s)");
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
finfo->flv_ops = fops;
|
|
|
|
|
|
|
|
if (seg6_next_csid_enabled(fops)) {
|
|
|
|
/* Locator-Block and Locator-Node Function lengths can be
|
|
|
|
* provided by the user space. Otherwise, default values are
|
|
|
|
* applied.
|
|
|
|
*/
|
|
|
|
rc = seg6_parse_nla_next_csid_cfg(tb, finfo, extack);
|
|
|
|
if (rc < 0)
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_fill_nla_next_csid_cfg(struct sk_buff *skb,
|
|
|
|
struct seg6_flavors_info *finfo)
|
|
|
|
{
|
|
|
|
if (nla_put_u8(skb, SEG6_LOCAL_FLV_LCBLOCK_BITS, finfo->lcblock_bits))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (nla_put_u8(skb, SEG6_LOCAL_FLV_LCNODE_FN_BITS,
|
|
|
|
finfo->lcnode_func_bits))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int put_nla_flavors(struct sk_buff *skb, struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
__u32 fops = finfo->flv_ops;
|
|
|
|
struct nlattr *nest;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
nest = nla_nest_start(skb, SEG6_LOCAL_FLAVORS);
|
|
|
|
if (!nest)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_FLV_OPERATION, fops)) {
|
|
|
|
rc = -EMSGSIZE;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (seg6_next_csid_enabled(fops)) {
|
|
|
|
rc = seg6_fill_nla_next_csid_cfg(skb, finfo);
|
|
|
|
if (rc < 0)
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return nla_nest_end(skb, nest);
|
|
|
|
|
|
|
|
err:
|
|
|
|
nla_nest_cancel(skb, nest);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_cmp_nla_next_csid_cfg(struct seg6_flavors_info *finfo_a,
|
|
|
|
struct seg6_flavors_info *finfo_b)
|
|
|
|
{
|
|
|
|
if (finfo_a->lcblock_bits != finfo_b->lcblock_bits)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
if (finfo_a->lcnode_func_bits != finfo_b->lcnode_func_bits)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int cmp_nla_flavors(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
|
|
|
|
{
|
|
|
|
struct seg6_flavors_info *finfo_a = &a->flv_info;
|
|
|
|
struct seg6_flavors_info *finfo_b = &b->flv_info;
|
|
|
|
|
|
|
|
if (finfo_a->flv_ops != finfo_b->flv_ops)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
if (seg6_next_csid_enabled(finfo_a->flv_ops)) {
|
|
|
|
if (seg6_cmp_nla_next_csid_cfg(finfo_a, finfo_b))
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int encap_size_flavors(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct seg6_flavors_info *finfo = &slwt->flv_info;
|
|
|
|
int nlsize;
|
|
|
|
|
|
|
|
nlsize = nla_total_size(0) + /* nest SEG6_LOCAL_FLAVORS */
|
|
|
|
nla_total_size(4); /* SEG6_LOCAL_FLV_OPERATION */
|
|
|
|
|
|
|
|
if (seg6_next_csid_enabled(finfo->flv_ops))
|
|
|
|
nlsize += nla_total_size(1) + /* SEG6_LOCAL_FLV_LCBLOCK_BITS */
|
|
|
|
nla_total_size(1); /* SEG6_LOCAL_FLV_LCNODE_FN_BITS */
|
|
|
|
|
|
|
|
return nlsize;
|
|
|
|
}
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
struct seg6_action_param {
|
2022-09-13 01:16:17 +08:00
|
|
|
int (*parse)(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack);
|
2017-08-05 18:38:26 +08:00
|
|
|
int (*put)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
|
|
|
|
int (*cmp)(struct seg6_local_lwt *a, struct seg6_local_lwt *b);
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
|
|
|
|
/* optional destroy() callback useful for releasing resources which
|
|
|
|
* have been previously acquired in the corresponding parse()
|
|
|
|
* function.
|
|
|
|
*/
|
|
|
|
void (*destroy)(struct seg6_local_lwt *slwt);
|
2017-08-05 18:38:26 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct seg6_action_param seg6_action_params[SEG6_LOCAL_MAX + 1] = {
|
2017-08-05 18:38:27 +08:00
|
|
|
[SEG6_LOCAL_SRH] = { .parse = parse_nla_srh,
|
|
|
|
.put = put_nla_srh,
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
.cmp = cmp_nla_srh,
|
|
|
|
.destroy = destroy_attr_srh },
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2017-08-05 18:38:27 +08:00
|
|
|
[SEG6_LOCAL_TABLE] = { .parse = parse_nla_table,
|
|
|
|
.put = put_nla_table,
|
|
|
|
.cmp = cmp_nla_table },
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2017-08-05 18:38:27 +08:00
|
|
|
[SEG6_LOCAL_NH4] = { .parse = parse_nla_nh4,
|
|
|
|
.put = put_nla_nh4,
|
|
|
|
.cmp = cmp_nla_nh4 },
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2017-08-05 18:38:27 +08:00
|
|
|
[SEG6_LOCAL_NH6] = { .parse = parse_nla_nh6,
|
|
|
|
.put = put_nla_nh6,
|
|
|
|
.cmp = cmp_nla_nh6 },
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2017-08-05 18:38:27 +08:00
|
|
|
[SEG6_LOCAL_IIF] = { .parse = parse_nla_iif,
|
|
|
|
.put = put_nla_iif,
|
|
|
|
.cmp = cmp_nla_iif },
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2017-08-05 18:38:27 +08:00
|
|
|
[SEG6_LOCAL_OIF] = { .parse = parse_nla_oif,
|
|
|
|
.put = put_nla_oif,
|
|
|
|
.cmp = cmp_nla_oif },
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
|
|
|
|
[SEG6_LOCAL_BPF] = { .parse = parse_nla_bpf,
|
|
|
|
.put = put_nla_bpf,
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
.cmp = cmp_nla_bpf,
|
|
|
|
.destroy = destroy_attr_bpf },
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
|
2020-12-02 21:05:14 +08:00
|
|
|
[SEG6_LOCAL_VRFTABLE] = { .parse = parse_nla_vrftable,
|
|
|
|
.put = put_nla_vrftable,
|
|
|
|
.cmp = cmp_nla_vrftable },
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
[SEG6_LOCAL_COUNTERS] = { .parse = parse_nla_counters,
|
|
|
|
.put = put_nla_counters,
|
|
|
|
.cmp = cmp_nla_counters,
|
|
|
|
.destroy = destroy_attr_counters },
|
2022-09-13 01:16:18 +08:00
|
|
|
|
|
|
|
[SEG6_LOCAL_FLAVORS] = { .parse = parse_nla_flavors,
|
|
|
|
.put = put_nla_flavors,
|
|
|
|
.cmp = cmp_nla_flavors },
|
2017-08-05 18:38:26 +08:00
|
|
|
};
|
|
|
|
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
/* call the destroy() callback (if available) for each set attribute in
|
2020-12-02 21:05:12 +08:00
|
|
|
* @parsed_attrs, starting from the first attribute up to the @max_parsed
|
|
|
|
* (excluded) attribute.
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
*/
|
2020-12-02 21:05:12 +08:00
|
|
|
static void __destroy_attrs(unsigned long parsed_attrs, int max_parsed,
|
|
|
|
struct seg6_local_lwt *slwt)
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
{
|
|
|
|
struct seg6_action_param *param;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/* Every required seg6local attribute is identified by an ID which is
|
|
|
|
* encoded as a flag (i.e: 1 << ID) in the 'attrs' bitmask;
|
|
|
|
*
|
2020-12-02 21:05:12 +08:00
|
|
|
* We scan the 'parsed_attrs' bitmask, starting from the first attribute
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
* up to the @max_parsed (excluded) attribute.
|
|
|
|
* For each set attribute, we retrieve the corresponding destroy()
|
|
|
|
* callback. If the callback is not available, then we skip to the next
|
|
|
|
* attribute; otherwise, we call the destroy() callback.
|
|
|
|
*/
|
2022-08-03 00:12:03 +08:00
|
|
|
for (i = SEG6_LOCAL_SRH; i < max_parsed; ++i) {
|
2021-02-07 01:09:34 +08:00
|
|
|
if (!(parsed_attrs & SEG6_F_ATTR(i)))
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
param = &seg6_action_params[i];
|
|
|
|
|
|
|
|
if (param->destroy)
|
|
|
|
param->destroy(slwt);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* release all the resources that may have been acquired during parsing
|
|
|
|
* operations.
|
|
|
|
*/
|
|
|
|
static void destroy_attrs(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
2020-12-02 21:05:12 +08:00
|
|
|
unsigned long attrs = slwt->desc->attrs | slwt->parsed_optattrs;
|
|
|
|
|
|
|
|
__destroy_attrs(attrs, SEG6_LOCAL_MAX + 1, slwt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int parse_nla_optional_attrs(struct nlattr **attrs,
|
2022-09-13 01:16:17 +08:00
|
|
|
struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2020-12-02 21:05:12 +08:00
|
|
|
{
|
|
|
|
struct seg6_action_desc *desc = slwt->desc;
|
|
|
|
unsigned long parsed_optattrs = 0;
|
|
|
|
struct seg6_action_param *param;
|
|
|
|
int err, i;
|
|
|
|
|
2022-08-03 00:12:03 +08:00
|
|
|
for (i = SEG6_LOCAL_SRH; i < SEG6_LOCAL_MAX + 1; ++i) {
|
2021-02-07 01:09:34 +08:00
|
|
|
if (!(desc->optattrs & SEG6_F_ATTR(i)) || !attrs[i])
|
2020-12-02 21:05:12 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/* once here, the i-th attribute is provided by the
|
|
|
|
* userspace AND it is identified optional as well.
|
|
|
|
*/
|
|
|
|
param = &seg6_action_params[i];
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
err = param->parse(attrs, slwt, extack);
|
2020-12-02 21:05:12 +08:00
|
|
|
if (err < 0)
|
|
|
|
goto parse_optattrs_err;
|
|
|
|
|
|
|
|
/* current attribute has been correctly parsed */
|
2021-02-07 01:09:34 +08:00
|
|
|
parsed_optattrs |= SEG6_F_ATTR(i);
|
2020-12-02 21:05:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* store in the tunnel state all the optional attributed successfully
|
|
|
|
* parsed.
|
|
|
|
*/
|
|
|
|
slwt->parsed_optattrs = parsed_optattrs;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
parse_optattrs_err:
|
|
|
|
__destroy_attrs(parsed_optattrs, i, slwt);
|
|
|
|
|
|
|
|
return err;
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
}
|
|
|
|
|
2020-12-02 21:05:13 +08:00
|
|
|
/* call the custom constructor of the behavior during its initialization phase
|
|
|
|
* and after that all its attributes have been parsed successfully.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
seg6_local_lwtunnel_build_state(struct seg6_local_lwt *slwt, const void *cfg,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct seg6_action_desc *desc = slwt->desc;
|
|
|
|
struct seg6_local_lwtunnel_ops *ops;
|
|
|
|
|
|
|
|
ops = &desc->slwt_ops;
|
|
|
|
if (!ops->build_state)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return ops->build_state(slwt, cfg, extack);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* call the custom destructor of the behavior which is invoked before the
|
|
|
|
* tunnel is going to be destroyed.
|
|
|
|
*/
|
|
|
|
static void seg6_local_lwtunnel_destroy_state(struct seg6_local_lwt *slwt)
|
|
|
|
{
|
|
|
|
struct seg6_action_desc *desc = slwt->desc;
|
|
|
|
struct seg6_local_lwtunnel_ops *ops;
|
|
|
|
|
|
|
|
ops = &desc->slwt_ops;
|
|
|
|
if (!ops->destroy_state)
|
|
|
|
return;
|
|
|
|
|
|
|
|
ops->destroy_state(slwt);
|
|
|
|
}
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
static int parse_nla_action(struct nlattr **attrs, struct seg6_local_lwt *slwt,
|
|
|
|
struct netlink_ext_ack *extack)
|
2017-08-05 18:38:26 +08:00
|
|
|
{
|
|
|
|
struct seg6_action_param *param;
|
|
|
|
struct seg6_action_desc *desc;
|
2020-12-02 21:05:12 +08:00
|
|
|
unsigned long invalid_attrs;
|
2017-08-05 18:38:26 +08:00
|
|
|
int i, err;
|
|
|
|
|
|
|
|
desc = __get_action_desc(slwt->action);
|
|
|
|
if (!desc)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!desc->input)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
slwt->desc = desc;
|
|
|
|
slwt->headroom += desc->static_headroom;
|
|
|
|
|
2020-12-02 21:05:12 +08:00
|
|
|
/* Forcing the desc->optattrs *set* and the desc->attrs *set* to be
|
|
|
|
* disjoined, this allow us to release acquired resources by optional
|
|
|
|
* attributes and by required attributes independently from each other
|
2021-04-11 01:46:14 +08:00
|
|
|
* without any interference.
|
2020-12-02 21:05:12 +08:00
|
|
|
* In other terms, we are sure that we do not release some the acquired
|
|
|
|
* resources twice.
|
|
|
|
*
|
|
|
|
* Note that if an attribute is configured both as required and as
|
|
|
|
* optional, it means that the user has messed something up in the
|
|
|
|
* seg6_action_table. Therefore, this check is required for SRv6
|
|
|
|
* behaviors to work properly.
|
|
|
|
*/
|
|
|
|
invalid_attrs = desc->attrs & desc->optattrs;
|
|
|
|
if (invalid_attrs) {
|
|
|
|
WARN_ONCE(1,
|
|
|
|
"An attribute cannot be both required AND optional");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* parse the required attributes */
|
2022-08-03 00:12:03 +08:00
|
|
|
for (i = SEG6_LOCAL_SRH; i < SEG6_LOCAL_MAX + 1; i++) {
|
2021-02-07 01:09:34 +08:00
|
|
|
if (desc->attrs & SEG6_F_ATTR(i)) {
|
2017-08-05 18:38:26 +08:00
|
|
|
if (!attrs[i])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
param = &seg6_action_params[i];
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
err = param->parse(attrs, slwt, extack);
|
2017-08-05 18:38:26 +08:00
|
|
|
if (err < 0)
|
2020-12-02 21:05:12 +08:00
|
|
|
goto parse_attrs_err;
|
2017-08-05 18:38:26 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-12-02 21:05:12 +08:00
|
|
|
/* parse the optional attributes, if any */
|
2022-09-13 01:16:17 +08:00
|
|
|
err = parse_nla_optional_attrs(attrs, slwt, extack);
|
2020-12-02 21:05:12 +08:00
|
|
|
if (err < 0)
|
|
|
|
goto parse_attrs_err;
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
return 0;
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
|
2020-12-02 21:05:12 +08:00
|
|
|
parse_attrs_err:
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
/* release any resource that may have been acquired during the i-1
|
|
|
|
* parse() operations.
|
|
|
|
*/
|
2020-12-02 21:05:12 +08:00
|
|
|
__destroy_attrs(desc->attrs, i, slwt);
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
|
|
|
|
return err;
|
2017-08-05 18:38:26 +08:00
|
|
|
}
|
|
|
|
|
2020-03-28 06:00:21 +08:00
|
|
|
static int seg6_local_build_state(struct net *net, struct nlattr *nla,
|
|
|
|
unsigned int family, const void *cfg,
|
|
|
|
struct lwtunnel_state **ts,
|
2017-08-05 18:38:26 +08:00
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct nlattr *tb[SEG6_LOCAL_MAX + 1];
|
|
|
|
struct lwtunnel_state *newts;
|
|
|
|
struct seg6_local_lwt *slwt;
|
|
|
|
int err;
|
|
|
|
|
2017-08-25 15:56:46 +08:00
|
|
|
if (family != AF_INET6)
|
|
|
|
return -EINVAL;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nla_parse_nested_deprecated(tb, SEG6_LOCAL_MAX, nla,
|
|
|
|
seg6_local_policy, extack);
|
2017-08-05 18:38:26 +08:00
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (!tb[SEG6_LOCAL_ACTION])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
newts = lwtunnel_state_alloc(sizeof(*slwt));
|
|
|
|
if (!newts)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
slwt = seg6_local_lwtunnel(newts);
|
|
|
|
slwt->action = nla_get_u32(tb[SEG6_LOCAL_ACTION]);
|
|
|
|
|
2022-09-13 01:16:17 +08:00
|
|
|
err = parse_nla_action(tb, slwt, extack);
|
2017-08-05 18:38:26 +08:00
|
|
|
if (err < 0)
|
|
|
|
goto out_free;
|
|
|
|
|
2020-12-02 21:05:13 +08:00
|
|
|
err = seg6_local_lwtunnel_build_state(slwt, cfg, extack);
|
|
|
|
if (err < 0)
|
|
|
|
goto out_destroy_attrs;
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
newts->type = LWTUNNEL_ENCAP_SEG6_LOCAL;
|
|
|
|
newts->flags = LWTUNNEL_STATE_INPUT_REDIRECT;
|
|
|
|
newts->headroom = slwt->headroom;
|
|
|
|
|
|
|
|
*ts = newts;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2020-12-02 21:05:13 +08:00
|
|
|
out_destroy_attrs:
|
|
|
|
destroy_attrs(slwt);
|
2017-08-05 18:38:26 +08:00
|
|
|
out_free:
|
|
|
|
kfree(newts);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void seg6_local_destroy_state(struct lwtunnel_state *lwt)
|
|
|
|
{
|
|
|
|
struct seg6_local_lwt *slwt = seg6_local_lwtunnel(lwt);
|
|
|
|
|
2020-12-02 21:05:13 +08:00
|
|
|
seg6_local_lwtunnel_destroy_state(slwt);
|
|
|
|
|
seg6: improve management of behavior attributes
Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc),
the parse() callback performs some validity checks on the provided input
and updates the tunnel state (slwt) with the result of the parsing
operation. However, an attribute may also need to reserve some additional
resources (i.e.: memory or setting up an eBPF program) in the parse()
callback to complete the parsing operation.
The parse() callbacks are invoked by the parse_nla_action() for each
attribute belonging to a specific behavior. Given a behavior with N
attributes, if the parsing of the i-th attribute fails, the
parse_nla_action() returns immediately with an error. Nonetheless, the
resources acquired during the parsing of the i-1 attributes are not freed
by the parse_nla_action().
Attributes which acquire resources must release them *in an explicit way*
in both the seg6_local_{build/destroy}_state(). However, adding a new
attribute of this type requires changes to
seg6_local_{build/destroy}_state() to release the resources correctly.
The seg6local infrastructure still lacks a simple and structured way to
release the resources acquired in the parse() operations.
We introduced a new callback in the struct seg6_action_param named
destroy(). This callback releases any resource which may have been acquired
in the parse() counterpart. Each attribute may or may not implement the
destroy() callback depending on whether it needs to free some acquired
resources.
The destroy() callback comes with several of advantages:
1) we can have many attributes as we want for a given behavior with no
need to explicitly free the taken resources;
2) As in case of the seg6_local_build_state(), the
seg6_local_destroy_state() does not need to handle the release of
resources directly. Indeed, it calls the destroy_attrs() function which
is in charge of calling the destroy() callback for every set attribute.
We do not need to patch seg6_local_{build/destroy}_state() anymore as
we add new attributes;
3) the code is more readable and better structured. Indeed, all the
information needed to handle a given attribute are contained in only
one place;
4) it facilitates the integration with new features introduced in further
patches.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02 21:05:11 +08:00
|
|
|
destroy_attrs(slwt);
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
|
|
|
|
return;
|
2017-08-05 18:38:26 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_local_fill_encap(struct sk_buff *skb,
|
|
|
|
struct lwtunnel_state *lwt)
|
|
|
|
{
|
|
|
|
struct seg6_local_lwt *slwt = seg6_local_lwtunnel(lwt);
|
|
|
|
struct seg6_action_param *param;
|
2020-12-02 21:05:12 +08:00
|
|
|
unsigned long attrs;
|
2017-08-05 18:38:26 +08:00
|
|
|
int i, err;
|
|
|
|
|
|
|
|
if (nla_put_u32(skb, SEG6_LOCAL_ACTION, slwt->action))
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
2020-12-02 21:05:12 +08:00
|
|
|
attrs = slwt->desc->attrs | slwt->parsed_optattrs;
|
|
|
|
|
2022-08-03 00:12:03 +08:00
|
|
|
for (i = SEG6_LOCAL_SRH; i < SEG6_LOCAL_MAX + 1; i++) {
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(i)) {
|
2017-08-05 18:38:26 +08:00
|
|
|
param = &seg6_action_params[i];
|
|
|
|
err = param->put(skb, slwt);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_local_get_encap_size(struct lwtunnel_state *lwt)
|
|
|
|
{
|
|
|
|
struct seg6_local_lwt *slwt = seg6_local_lwtunnel(lwt);
|
|
|
|
unsigned long attrs;
|
|
|
|
int nlsize;
|
|
|
|
|
|
|
|
nlsize = nla_total_size(4); /* action */
|
|
|
|
|
2020-12-02 21:05:12 +08:00
|
|
|
attrs = slwt->desc->attrs | slwt->parsed_optattrs;
|
2017-08-05 18:38:26 +08:00
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_SRH))
|
2017-08-05 18:38:26 +08:00
|
|
|
nlsize += nla_total_size((slwt->srh->hdrlen + 1) << 3);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_TABLE))
|
2017-08-05 18:38:26 +08:00
|
|
|
nlsize += nla_total_size(4);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_NH4))
|
2017-08-05 18:38:26 +08:00
|
|
|
nlsize += nla_total_size(4);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_NH6))
|
2017-08-05 18:38:26 +08:00
|
|
|
nlsize += nla_total_size(16);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_IIF))
|
2017-08-05 18:38:26 +08:00
|
|
|
nlsize += nla_total_size(4);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_OIF))
|
2017-08-05 18:38:26 +08:00
|
|
|
nlsize += nla_total_size(4);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_BPF))
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
nlsize += nla_total_size(sizeof(struct nlattr)) +
|
|
|
|
nla_total_size(MAX_PROG_NAME) +
|
|
|
|
nla_total_size(4);
|
|
|
|
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_VRFTABLE))
|
2020-12-02 21:05:14 +08:00
|
|
|
nlsize += nla_total_size(4);
|
|
|
|
|
seg6: add counters support for SRv6 Behaviors
This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.
Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.
Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.
When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters
Troubleshooting using SRv6 Behavior counters
--------------------------------------------
Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:
(i) arrived at the node;
(ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.
The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.
Turning on/off counters for SRv6 Behaviors
------------------------------------------
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Impact of counters for SRv6 Behaviors on performance
====================================================
To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.
We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.
Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:
1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
instance of an SRv6 End.DX2 Behavior;
2) patched kernel with SRv6 Behavior counters and a single instance of
an SRv6 End.DX2 Behavior with counters turned off;
3) patched kernel with SRv6 Behavior counters and a single instance of
SRv6 End.DX2 Behavior with counters turned on.
All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.
Results of tests are shown in the following table:
Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps
As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).
Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.
[2] https://www.cloudlab.us
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27 23:44:04 +08:00
|
|
|
if (attrs & SEG6_F_LOCAL_COUNTERS)
|
|
|
|
nlsize += nla_total_size(0) + /* nest SEG6_LOCAL_COUNTERS */
|
|
|
|
/* SEG6_LOCAL_CNT_PACKETS */
|
|
|
|
nla_total_size_64bit(sizeof(__u64)) +
|
|
|
|
/* SEG6_LOCAL_CNT_BYTES */
|
|
|
|
nla_total_size_64bit(sizeof(__u64)) +
|
|
|
|
/* SEG6_LOCAL_CNT_ERRORS */
|
|
|
|
nla_total_size_64bit(sizeof(__u64));
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
if (attrs & SEG6_F_ATTR(SEG6_LOCAL_FLAVORS))
|
|
|
|
nlsize += encap_size_flavors(slwt);
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
return nlsize;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int seg6_local_cmp_encap(struct lwtunnel_state *a,
|
|
|
|
struct lwtunnel_state *b)
|
|
|
|
{
|
|
|
|
struct seg6_local_lwt *slwt_a, *slwt_b;
|
|
|
|
struct seg6_action_param *param;
|
2020-12-02 21:05:12 +08:00
|
|
|
unsigned long attrs_a, attrs_b;
|
2017-08-05 18:38:26 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
slwt_a = seg6_local_lwtunnel(a);
|
|
|
|
slwt_b = seg6_local_lwtunnel(b);
|
|
|
|
|
|
|
|
if (slwt_a->action != slwt_b->action)
|
|
|
|
return 1;
|
|
|
|
|
2020-12-02 21:05:12 +08:00
|
|
|
attrs_a = slwt_a->desc->attrs | slwt_a->parsed_optattrs;
|
|
|
|
attrs_b = slwt_b->desc->attrs | slwt_b->parsed_optattrs;
|
|
|
|
|
|
|
|
if (attrs_a != attrs_b)
|
2017-08-05 18:38:26 +08:00
|
|
|
return 1;
|
|
|
|
|
2022-08-03 00:12:03 +08:00
|
|
|
for (i = SEG6_LOCAL_SRH; i < SEG6_LOCAL_MAX + 1; i++) {
|
2021-02-07 01:09:34 +08:00
|
|
|
if (attrs_a & SEG6_F_ATTR(i)) {
|
2017-08-05 18:38:26 +08:00
|
|
|
param = &seg6_action_params[i];
|
|
|
|
if (param->cmp(slwt_a, slwt_b))
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct lwtunnel_encap_ops seg6_local_ops = {
|
|
|
|
.build_state = seg6_local_build_state,
|
|
|
|
.destroy_state = seg6_local_destroy_state,
|
|
|
|
.input = seg6_local_input,
|
|
|
|
.fill_encap = seg6_local_fill_encap,
|
|
|
|
.get_encap_size = seg6_local_get_encap_size,
|
|
|
|
.cmp_encap = seg6_local_cmp_encap,
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
};
|
|
|
|
|
|
|
|
int __init seg6_local_init(void)
|
|
|
|
{
|
2021-02-07 01:09:34 +08:00
|
|
|
/* If the max total number of defined attributes is reached, then your
|
|
|
|
* kernel build stops here.
|
|
|
|
*
|
|
|
|
* This check is required to avoid arithmetic overflows when processing
|
|
|
|
* behavior attributes and the maximum number of defined attributes
|
|
|
|
* exceeds the allowed value.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(SEG6_LOCAL_MAX + 1 > BITS_PER_TYPE(unsigned long));
|
|
|
|
|
2023-08-13 02:09:25 +08:00
|
|
|
/* Check whether the number of defined flavors exceeds the maximum
|
|
|
|
* allowed value.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(SEG6_LOCAL_FLV_OP_MAX + 1 > BITS_PER_TYPE(__u32));
|
|
|
|
|
2022-09-13 01:16:18 +08:00
|
|
|
/* If the default NEXT-C-SID Locator-Block/Node Function lengths (in
|
|
|
|
* bits) have been changed with invalid values, kernel build stops
|
|
|
|
* here.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(next_csid_chk_cntr_bits(SEG6_LOCAL_LCBLOCK_DBITS,
|
|
|
|
SEG6_LOCAL_LCNODE_FN_DBITS));
|
|
|
|
BUILD_BUG_ON(next_csid_chk_lcblock_bits(SEG6_LOCAL_LCBLOCK_DBITS));
|
|
|
|
BUILD_BUG_ON(next_csid_chk_lcnode_fn_bits(SEG6_LOCAL_LCNODE_FN_DBITS));
|
|
|
|
|
seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15 21:46:58 +08:00
|
|
|
/* To be memory efficient, we use 'u8' to represent the different
|
|
|
|
* actions related to RFC8986 flavors. If the kernel build stops here,
|
|
|
|
* it means that it is not possible to correctly encode these actions
|
|
|
|
* with the data type chosen for the action table.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(SEG6_LOCAL_FLV_ACT_MAX > (typeof(flv8986_act_tbl[0]))~0U);
|
|
|
|
|
2017-08-05 18:38:26 +08:00
|
|
|
return lwtunnel_encap_add_ops(&seg6_local_ops,
|
|
|
|
LWTUNNEL_ENCAP_SEG6_LOCAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
void seg6_local_exit(void)
|
|
|
|
{
|
|
|
|
lwtunnel_encap_del_ops(&seg6_local_ops, LWTUNNEL_ENCAP_SEG6_LOCAL);
|
|
|
|
}
|