ipv6: Implement limits on Hop-by-Hop and Destination options

RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).

By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.

  25.38%  [kernel]                    [k] __fib6_clean_all
  21.63%  [kernel]                    [k] ip6_parse_tlv
   4.21%  [kernel]                    [k] __local_bh_enable_ip
   2.18%  [kernel]                    [k] ip6_pol_route.isra.39
   1.98%  [kernel]                    [k] fib6_walk_continue
   1.88%  [kernel]                    [k] _raw_write_lock_bh
   1.65%  [kernel]                    [k] dst_release

This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
  - Limit the number of options in a Hop-by-Hop or Destination options
    extension header.
  - Limit the byte length of a Hop-by-Hop or Destination options
    extension header.
  - Disallow unrecognized options in a Hop-by-Hop or Destination
    options extension header.

The limits are set in corresponding sysctls:

  ipv6.sysctl.max_dst_opts_cnt
  ipv6.sysctl.max_hbh_opts_cnt
  ipv6.sysctl.max_dst_opts_len
  ipv6.sysctl.max_hbh_opts_len

If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.

If a limit is exceeded when processing an extension header the packet is
dropped.

Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).

These limits have being proposed in draft-ietf-6man-rfc6434-bis.

Tested (by Martin Lau)

I tested out 1 thread (i.e. one raw_udp process).

I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.

v2;
  - Code and documention cleanup.
  - Change references of RFC2460 to be RFC8200.
  - Add reference to RFC6434-bis where the limits will be in standard.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Tom Herbert 2017-10-30 14:16:00 -07:00 committed by David S. Miller
parent 2d2faaf056
commit 47d3d7ac65
6 changed files with 159 additions and 12 deletions

View File

@ -1385,6 +1385,30 @@ mld_qrv - INTEGER
Default: 2 (as specified by RFC3810 9.1) Default: 2 (as specified by RFC3810 9.1)
Minimum: 1 (as specified by RFC6636 4.5) Minimum: 1 (as specified by RFC6636 4.5)
max_dst_opts_cnt - INTEGER
Maximum number of non-padding TLVs allowed in a Destination
options extension header. If this value is less than zero
then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number.
Default: 8
max_hbh_opts_cnt - INTEGER
Maximum number of non-padding TLVs allowed in a Hop-by-Hop
options extension header. If this value is less than zero
then unknown options are disallowed and the number of known
TLVs allowed is the absolute value of this number.
Default: 8
max dst_opts_len - INTEGER
Maximum length allowed for a Destination options extension
header.
Default: INT_MAX (unlimited)
max hbh_opts_len - INTEGER
Maximum length allowed for a Hop-by-Hop options extension
header.
Default: INT_MAX (unlimited)
IPv6 Fragmentation: IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER ip6frag_high_thresh - INTEGER

View File

@ -51,6 +51,46 @@
#define IPV6_DEFAULT_HOPLIMIT 64 #define IPV6_DEFAULT_HOPLIMIT 64
#define IPV6_DEFAULT_MCASTHOPS 1 #define IPV6_DEFAULT_MCASTHOPS 1
/* Limits on Hop-by-Hop and Destination options.
*
* Per RFC8200 there is no limit on the maximum number or lengths of options in
* Hop-by-Hop or Destination options other then the packet must fit in an MTU.
* We allow configurable limits in order to mitigate potential denial of
* service attacks.
*
* There are three limits that may be set:
* - Limit the number of options in a Hop-by-Hop or Destination options
* extension header
* - Limit the byte length of a Hop-by-Hop or Destination options extension
* header
* - Disallow unknown options
*
* The limits are expressed in corresponding sysctls:
*
* ipv6.sysctl.max_dst_opts_cnt
* ipv6.sysctl.max_hbh_opts_cnt
* ipv6.sysctl.max_dst_opts_len
* ipv6.sysctl.max_hbh_opts_len
*
* max_*_opts_cnt is the number of TLVs that are allowed for Destination
* options or Hop-by-Hop options. If the number is less than zero then unknown
* TLVs are disallowed and the number of known options that are allowed is the
* absolute value. Setting the value to INT_MAX indicates no limit.
*
* max_*_opts_len is the length limit in bytes of a Destination or
* Hop-by-Hop options extension header. Setting the value to INT_MAX
* indicates no length limit.
*
* If a limit is exceeded when processing an extension header the packet is
* silently discarded.
*/
/* Default limits for Hop-by-Hop and Destination options */
#define IP6_DEFAULT_MAX_DST_OPTS_CNT 8
#define IP6_DEFAULT_MAX_HBH_OPTS_CNT 8
#define IP6_DEFAULT_MAX_DST_OPTS_LEN INT_MAX /* No limit */
#define IP6_DEFAULT_MAX_HBH_OPTS_LEN INT_MAX /* No limit */
/* /*
* Addr type * Addr type
* *

View File

@ -37,6 +37,10 @@ struct netns_sysctl_ipv6 {
int idgen_delay; int idgen_delay;
int flowlabel_state_ranges; int flowlabel_state_ranges;
int flowlabel_reflect; int flowlabel_reflect;
int max_dst_opts_cnt;
int max_hbh_opts_cnt;
int max_dst_opts_len;
int max_hbh_opts_len;
}; };
struct netns_ipv6 { struct netns_ipv6 {

View File

@ -810,6 +810,10 @@ static int __net_init inet6_net_init(struct net *net)
net->ipv6.sysctl.idgen_retries = 3; net->ipv6.sysctl.idgen_retries = 3;
net->ipv6.sysctl.idgen_delay = 1 * HZ; net->ipv6.sysctl.idgen_delay = 1 * HZ;
net->ipv6.sysctl.flowlabel_state_ranges = 0; net->ipv6.sysctl.flowlabel_state_ranges = 0;
net->ipv6.sysctl.max_dst_opts_cnt = IP6_DEFAULT_MAX_DST_OPTS_CNT;
net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
atomic_set(&net->ipv6.fib6_sernum, 1); atomic_set(&net->ipv6.fib6_sernum, 1);
err = ipv6_init_mibs(net); err = ipv6_init_mibs(net);

View File

@ -74,8 +74,20 @@ struct tlvtype_proc {
/* An unknown option is detected, decide what to do */ /* An unknown option is detected, decide what to do */
static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff) static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff,
bool disallow_unknowns)
{ {
if (disallow_unknowns) {
/* If unknown TLVs are disallowed by configuration
* then always silently drop packet. Note this also
* means no ICMP parameter problem is sent which
* could be a good property to mitigate a reflection DOS
* attack.
*/
goto drop;
}
switch ((skb_network_header(skb)[optoff] & 0xC0) >> 6) { switch ((skb_network_header(skb)[optoff] & 0xC0) >> 6) {
case 0: /* ignore */ case 0: /* ignore */
return true; return true;
@ -95,20 +107,30 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff)
return false; return false;
} }
drop:
kfree_skb(skb); kfree_skb(skb);
return false; return false;
} }
/* Parse tlv encoded option header (hop-by-hop or destination) */ /* Parse tlv encoded option header (hop-by-hop or destination) */
static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb) static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
struct sk_buff *skb,
int max_count)
{ {
const struct tlvtype_proc *curr; int len = (skb_transport_header(skb)[1] + 1) << 3;
const unsigned char *nh = skb_network_header(skb); const unsigned char *nh = skb_network_header(skb);
int off = skb_network_header_len(skb); int off = skb_network_header_len(skb);
int len = (skb_transport_header(skb)[1] + 1) << 3; const struct tlvtype_proc *curr;
bool disallow_unknowns = false;
int tlv_count = 0;
int padlen = 0; int padlen = 0;
if (unlikely(max_count < 0)) {
disallow_unknowns = true;
max_count = -max_count;
}
if (skb_transport_offset(skb) + len > skb_headlen(skb)) if (skb_transport_offset(skb) + len > skb_headlen(skb))
goto bad; goto bad;
@ -149,6 +171,11 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
default: /* Other TLV code so scan list */ default: /* Other TLV code so scan list */
if (optlen > len) if (optlen > len)
goto bad; goto bad;
tlv_count++;
if (tlv_count > max_count)
goto bad;
for (curr = procs; curr->type >= 0; curr++) { for (curr = procs; curr->type >= 0; curr++) {
if (curr->type == nh[off]) { if (curr->type == nh[off]) {
/* type specific length/alignment /* type specific length/alignment
@ -159,10 +186,10 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
break; break;
} }
} }
if (curr->type < 0) { if (curr->type < 0 &&
if (ip6_tlvopt_unknown(skb, off) == 0) !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
return false; return false;
}
padlen = 0; padlen = 0;
break; break;
} }
@ -258,23 +285,31 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
__u16 dstbuf; __u16 dstbuf;
#endif #endif
struct dst_entry *dst = skb_dst(skb); struct dst_entry *dst = skb_dst(skb);
struct net *net = dev_net(skb->dev);
int extlen;
if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) || if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) ||
!pskb_may_pull(skb, (skb_transport_offset(skb) + !pskb_may_pull(skb, (skb_transport_offset(skb) +
((skb_transport_header(skb)[1] + 1) << 3)))) { ((skb_transport_header(skb)[1] + 1) << 3)))) {
__IP6_INC_STATS(dev_net(dst->dev), ip6_dst_idev(dst), __IP6_INC_STATS(dev_net(dst->dev), ip6_dst_idev(dst),
IPSTATS_MIB_INHDRERRORS); IPSTATS_MIB_INHDRERRORS);
fail_and_free:
kfree_skb(skb); kfree_skb(skb);
return -1; return -1;
} }
extlen = (skb_transport_header(skb)[1] + 1) << 3;
if (extlen > net->ipv6.sysctl.max_dst_opts_len)
goto fail_and_free;
opt->lastopt = opt->dst1 = skb_network_header_len(skb); opt->lastopt = opt->dst1 = skb_network_header_len(skb);
#if IS_ENABLED(CONFIG_IPV6_MIP6) #if IS_ENABLED(CONFIG_IPV6_MIP6)
dstbuf = opt->dst1; dstbuf = opt->dst1;
#endif #endif
if (ip6_parse_tlv(tlvprocdestopt_lst, skb)) { if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
skb->transport_header += (skb_transport_header(skb)[1] + 1) << 3; init_net.ipv6.sysctl.max_dst_opts_cnt)) {
skb->transport_header += extlen;
opt = IP6CB(skb); opt = IP6CB(skb);
#if IS_ENABLED(CONFIG_IPV6_MIP6) #if IS_ENABLED(CONFIG_IPV6_MIP6)
opt->nhoff = dstbuf; opt->nhoff = dstbuf;
@ -803,6 +838,8 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
int ipv6_parse_hopopts(struct sk_buff *skb) int ipv6_parse_hopopts(struct sk_buff *skb)
{ {
struct inet6_skb_parm *opt = IP6CB(skb); struct inet6_skb_parm *opt = IP6CB(skb);
struct net *net = dev_net(skb->dev);
int extlen;
/* /*
* skb_network_header(skb) is equal to skb->data, and * skb_network_header(skb) is equal to skb->data, and
@ -813,13 +850,19 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + 8) || if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + 8) ||
!pskb_may_pull(skb, (sizeof(struct ipv6hdr) + !pskb_may_pull(skb, (sizeof(struct ipv6hdr) +
((skb_transport_header(skb)[1] + 1) << 3)))) { ((skb_transport_header(skb)[1] + 1) << 3)))) {
fail_and_free:
kfree_skb(skb); kfree_skb(skb);
return -1; return -1;
} }
extlen = (skb_transport_header(skb)[1] + 1) << 3;
if (extlen > net->ipv6.sysctl.max_hbh_opts_len)
goto fail_and_free;
opt->flags |= IP6SKB_HOPBYHOP; opt->flags |= IP6SKB_HOPBYHOP;
if (ip6_parse_tlv(tlvprochopopt_lst, skb)) { if (ip6_parse_tlv(tlvprochopopt_lst, skb,
skb->transport_header += (skb_transport_header(skb)[1] + 1) << 3; init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
skb->transport_header += extlen;
opt = IP6CB(skb); opt = IP6CB(skb);
opt->nhoff = sizeof(struct ipv6hdr); opt->nhoff = sizeof(struct ipv6hdr);
return 1; return 1;

View File

@ -97,6 +97,34 @@ static struct ctl_table ipv6_table_template[] = {
.mode = 0644, .mode = 0644,
.proc_handler = proc_dointvec, .proc_handler = proc_dointvec,
}, },
{
.procname = "max_dst_opts_number",
.data = &init_net.ipv6.sysctl.max_dst_opts_cnt,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
},
{
.procname = "max_hbh_opts_number",
.data = &init_net.ipv6.sysctl.max_hbh_opts_cnt,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
},
{
.procname = "max_dst_opts_length",
.data = &init_net.ipv6.sysctl.max_dst_opts_len,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
},
{
.procname = "max_hbh_length",
.data = &init_net.ipv6.sysctl.max_hbh_opts_len,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
},
{ } { }
}; };
@ -157,6 +185,10 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
ipv6_table[7].data = &net->ipv6.sysctl.flowlabel_state_ranges; ipv6_table[7].data = &net->ipv6.sysctl.flowlabel_state_ranges;
ipv6_table[8].data = &net->ipv6.sysctl.ip_nonlocal_bind; ipv6_table[8].data = &net->ipv6.sysctl.ip_nonlocal_bind;
ipv6_table[9].data = &net->ipv6.sysctl.flowlabel_reflect; ipv6_table[9].data = &net->ipv6.sysctl.flowlabel_reflect;
ipv6_table[10].data = &net->ipv6.sysctl.max_dst_opts_cnt;
ipv6_table[11].data = &net->ipv6.sysctl.max_hbh_opts_cnt;
ipv6_table[12].data = &net->ipv6.sysctl.max_dst_opts_len;
ipv6_table[13].data = &net->ipv6.sysctl.max_hbh_opts_len;
ipv6_route_table = ipv6_route_sysctl_init(net); ipv6_route_table = ipv6_route_sysctl_init(net);
if (!ipv6_route_table) if (!ipv6_route_table)