2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-29 23:53:55 +08:00
linux-next/net/ipv4
Stefano Brivio 4cb47a8644 tunnels: PMTU discovery support for directly bridged IP packets
It's currently possible to bridge Ethernet tunnels carrying IP
packets directly to external interfaces without assigning them
addresses and routes on the bridged network itself: this is the case
for UDP tunnels bridged with a standard bridge or by Open vSwitch.

PMTU discovery is currently broken with those configurations, because
the encapsulation effectively decreases the MTU of the link, and
while we are able to account for this using PMTU discovery on the
lower layer, we don't have a way to relay ICMP or ICMPv6 messages
needed by the sender, because we don't have valid routes to it.

On the other hand, as a tunnel endpoint, we can't fragment packets
as a general approach: this is for instance clearly forbidden for
VXLAN by RFC 7348, section 4.3:

   VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
   fragment encapsulated VXLAN packets due to the larger frame size.
   The destination VTEP MAY silently discard such VXLAN fragments.

The same paragraph recommends that the MTU over the physical network
accomodates for encapsulations, but this isn't a practical option for
complex topologies, especially for typical Open vSwitch use cases.

Further, it states that:

   Other techniques like Path MTU discovery (see [RFC1191] and
   [RFC1981]) MAY be used to address this requirement as well.

Now, PMTU discovery already works for routed interfaces, we get
route exceptions created by the encapsulation device as they receive
ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
we already rebuild those messages with the appropriate MTU and route
them back to the sender.

Add the missing bits for bridged cases:

- checks in skb_tunnel_check_pmtu() to understand if it's appropriate
  to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
  RFC 4443 section 2.4 for ICMPv6. This function is already called by
  UDP tunnels

- a new function generating those ICMP or ICMPv6 replies. We can't
  reuse icmp_send() and icmp6_send() as we don't see the sender as a
  valid destination. This doesn't need to be generic, as we don't
  cover any other type of ICMP errors given that we only provide an
  encapsulation function to the sender

While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
we might receive GSO buffers here, and the passed headroom already
includes the inner MAC length, so we don't have to account for it
a second time (that would imply three MAC headers on the wire, but
there are just two).

This issue became visible while bridging IPv6 packets with 4500 bytes
of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
bytes of encapsulation headroom, we would advertise MTU as 3950, and
we would reject fragmented IPv6 datagrams of 3958 bytes size on the
wire. We're exclusively dealing with network MTU here, though, so we
could get Ethernet frames up to 3964 octets in that case.

v2:
- moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
- split IPv4/IPv6 functions (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04 13:01:45 -07:00
..
bpfilter net: improve the user pointer check in init_user_sockptr 2020-07-28 13:43:40 -07:00
netfilter net: remove sockptr_advance 2020-07-28 13:43:40 -07:00
af_inet.c net: remove compat_sock_common_{get,set}sockopt 2020-07-19 18:16:40 -07:00
ah4.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
arp.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
bpf_tcp_ca.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-03-30 19:52:37 -07:00
cipso_ipv4.c net: ipv4: kerneldoc fixes 2020-07-13 17:20:39 -07:00
datagram.c inet: stop leaking jiffies on the wire 2019-11-01 14:57:52 -07:00
devinet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-31 17:48:46 -07:00
esp4_offload.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
esp4.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
fib_frontend.c ipv4: nexthop version of fib_info_nh_uses_dev 2020-05-26 16:06:07 -07:00
fib_lookup.h net: add net available in build_state 2020-03-29 22:30:57 -07:00
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c fib: use indirect call wrappers in the most common fib_rules_ops 2020-07-28 17:42:31 -07:00
fib_semantics.c net: Fix the arp error in some cases 2020-06-18 20:21:51 -07:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-08-02 01:02:12 -07:00
fou.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
gre_demux.c gre: fix uninit-value in __iptunnel_pull_header 2020-03-08 21:25:37 -07:00
gre_offload.c net: gre: recompute gre csum for sctp over gre tunnels 2020-08-03 15:29:44 -07:00
icmp.c icmp6: support rfc 4884 2020-07-24 17:12:41 -07:00
igmp.c ip*_mc_gsfget(): lift copyout of struct group_filter into callers 2020-05-20 20:31:27 -04:00
inet_connection_sock.c net/ipv6: remove compat_ipv6_{get,set}sockopt 2020-07-19 18:16:41 -07:00
inet_diag.c inet_diag: support for wider protocol numbers 2020-07-09 12:38:41 -07:00
inet_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
inet_hashtables.c inet: Run SK_LOOKUP BPF program on socket lookup 2020-07-17 20:18:16 -07:00
inet_timewait_sock.c
inetpeer.c inetpeer: fix data-race in inet_putpeer / inet_putpeer 2019-11-07 16:15:56 -08:00
ip_forward.c ipv4: Revert removal of rt_uses_gateway 2019-09-20 18:23:33 -07:00
ip_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
ip_gre.c net: add a new ndo_tunnel_ioctl method 2020-05-19 15:45:11 -07:00
ip_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip_options.c net/ipv4: merge ip_options_get and ip_options_get_from_user 2020-07-24 15:41:54 -07:00
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-11 00:46:00 -07:00
ip_sockglue.c icmp: prepare rfc 4884 for ipv6 2020-07-24 17:12:41 -07:00
ip_tunnel_core.c tunnels: PMTU discovery support for directly bridged IP packets 2020-08-04 13:01:45 -07:00
ip_tunnel.c ip_tunnel: fix use-after-free in ip_tunnel_lookup() 2020-06-18 20:12:34 -07:00
ip_vti.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2020-07-30 14:39:31 -07:00
ipcomp.c ipcomp: assign if_id to child tunnel from parent tunnel 2020-07-09 12:55:37 +02:00
ipconfig.c Documentation: nfsroot.rst: Fix references to nfsroot.rst 2020-03-02 13:11:46 -07:00
ipip.c net: ipip: implement header_ops->parse_protocol for AF_PACKET 2020-06-30 12:29:39 -07:00
ipmr_base.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
ipmr.c ipmr: Copy option to correct variable 2020-07-27 11:39:55 -07:00
Kconfig Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
Makefile udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
metrics.c
netfilter.c
netlink.c
nexthop.c nexthop: Fix fdb labeling for groups 2020-06-10 13:18:40 -07:00
ping.c ipv4: fill fl4_icmp_{type,code} in ping_v4_sendmsg 2020-07-07 15:26:37 -07:00
proc.c tcp: add SNMP counter for no. of duplicate segments reported by DSACK 2020-07-17 12:54:30 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
raw.c net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
route.c ipv4: route: Ignore output interface in FIB lookup for PMTU route 2020-08-04 13:01:45 -07:00
syncookies.c tcp: fix build fong CONFIG_MPTCP=n 2020-08-01 11:46:29 -07:00
sysctl_net_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-05-01 17:02:27 -07:00
tcp_bbr.c tcp_bbr: improve arithmetic division in bbr_update_bw() 2020-01-21 10:45:49 +01:00
tcp_bic.c tcp: fix stretch ACK bugs in BIC 2020-03-16 18:26:54 -07:00
tcp_bpf.c bpf: tcp: Recv() should return 0 when the peer socket is closed 2020-06-12 15:10:12 -07:00
tcp_cdg.c
tcp_cong.c tcp: make sure listeners don't initialize congestion-control state 2020-07-09 13:07:45 -07:00
tcp_cubic.c tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT 2020-06-25 16:08:47 -07:00
tcp_dctcp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_dctcp.h
tcp_diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
tcp_fastopen.c tcp: add TCP_INFO status for failed client TFO 2019-10-25 19:25:37 -07:00
tcp_highspeed.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_htcp.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: apply a floor of 1 for RTT samples from TCP timestamps 2020-08-03 17:54:03 -07:00
tcp_ipv4.c bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t 2020-07-25 20:16:32 -07:00
tcp_lp.c
tcp_metrics.c net-tcp: Disable TCP ssthresh metrics cache by default 2019-12-09 20:17:48 -08:00
tcp_minisocks.c mptcp: add new sock flag to deal with join subflows 2020-05-15 12:30:13 -07:00
tcp_nv.c
tcp_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_output.c tcp: rename request_sock cookie_ts bit to syncookie 2020-07-31 16:55:32 -07:00
tcp_rate.c
tcp_recovery.c
tcp_scalable.c tcp: fix stretch ACK bugs in Scalable 2020-03-16 18:26:54 -07:00
tcp_timer.c net: ipv4: kerneldoc fixes 2020-07-13 17:20:39 -07:00
tcp_ulp.c bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_westwood.c
tcp_yeah.c tcp: fix stretch ACK bugs in Yeah 2020-03-16 18:26:55 -07:00
tcp.c tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS 2020-07-31 17:00:44 -07:00
tunnel4.c tunnel4: add cb_handler to struct xfrm_tunnel 2020-07-09 12:51:36 +02:00
udp_bpf.c bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
udp_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
udp_impl.h net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
udp_offload.c udp: initialize is_flist with 0 in udp_gro_receive 2020-03-30 10:35:03 -07:00
udp_tunnel_core.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp_tunnel_nic.c udp_tunnel: add the ability to hard-code IANA VXLAN 2020-08-03 10:13:54 -07:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp.c udp, bpf: Ignore connections in reuseport group after BPF sk lookup 2020-07-31 02:00:48 +02:00
udplite.c net/ipv4: remove compat_ip_{get,set}sockopt 2020-07-19 18:16:41 -07:00
xfrm4_input.c xfrm: state: remove extract_input indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_output.c xfrm: fix unused variable warning if CONFIG_NETFILTER=n 2020-05-11 15:12:27 +02:00
xfrm4_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
xfrm4_protocol.c xfrm: add route lookup to xfrm4_rcv_encap 2019-12-09 09:59:07 +01:00
xfrm4_state.c xfrm: remove output_finish indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_tunnel.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00