linux/net
Terin Stock efea112a87 ipvs: align inner_mac_header for encapsulation
[ Upstream commit d7fce52fdf ]

When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.

This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():

    perf probe --add '__dev_queue_xmit skb->inner_mac_header \
    skb->inner_network_header skb->mac_header skb->network_header'
    perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
    perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
    skb->inner_network_header skb->mac_header skb->network_header'

These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.

    probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
    probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2

When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.

In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.

This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.

Fixes: 84c0d5e96f ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock <terin@cloudflare.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28 10:29:48 +02:00
..
6lowpan
9p 9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition 2023-04-20 12:13:53 +02:00
802 mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:14:42 +01:00
8021q vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() 2023-05-24 17:36:52 +01:00
appletalk
atm atm: hide unused procfs functions 2023-06-09 10:32:26 +02:00
ax25 net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg 2022-06-22 14:22:01 +02:00
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-06-14 11:13:04 +02:00
bluetooth Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk 2023-06-14 11:13:06 +02:00
bpf bpf: Move skb->len == 0 checks into __bpf_redirect 2022-12-31 13:14:11 +01:00
bpfilter
bridge bridge: always declare tunnel functions 2023-05-24 17:36:52 +01:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-17 08:48:54 +01:00
can can: j1939: avoid possible use-after-free when j1939_can_rx_register fails 2023-06-14 11:13:06 +02:00
ceph libceph: fix potential use-after-free on linger ping and resends 2022-05-25 09:57:28 +02:00
core neighbour: delete neigh_lookup_nodev as not used 2023-06-21 15:59:19 +02:00
dcb net: dcb: disable softirqs in dcbnl_flush_dev() 2022-03-08 19:12:52 +01:00
dccp dccp: Call inet6_destroy_sock() via sk->sk_destruct(). 2023-04-26 13:51:54 +02:00
dns_resolver
dsa net: dsa: tag_brcm: legacy: fix daisy-chained switches 2023-03-30 12:47:48 +02:00
ethernet
ethtool ethtool: Fix uninitialized number of lanes 2023-05-17 11:50:18 +02:00
hsr hsr: ratelimit only when errors are printed 2023-04-05 11:25:02 +02:00
ieee802154 net: ieee802154: fix error return code in dgram_bind() 2022-11-03 23:59:14 +09:00
ife
ipv4 xfrm: Linearize the skb after offloading if needed. 2023-06-28 10:29:46 +02:00
ipv6 xfrm: Linearize the skb after offloading if needed. 2023-06-28 10:29:46 +02:00
iucv net/iucv: Fix size of interrupt data 2023-03-22 13:31:28 +01:00
kcm kcm: close race conditions on sk_receive_queue 2022-11-26 09:24:50 +01:00
key af_key: Reject optional tunnel/BEET mode templates in outbound policies 2023-05-24 17:36:49 +01:00
l2tp inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). 2023-04-26 13:51:54 +02:00
l3mdev l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu 2022-04-27 14:38:53 +02:00
lapb
llc net: deal with most data-races in sk_wait_event() 2023-05-24 17:36:42 +01:00
mac80211 wifi: mac80211: simplify chanctx allocation 2023-06-09 10:32:25 +02:00
mac802154 mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() 2022-12-14 11:37:25 +01:00
mctp net: mctp: purge receive queues on sk destruction 2023-02-06 07:59:02 +01:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-22 12:57:09 +01:00
mptcp inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). 2023-04-26 13:51:54 +02:00
ncsi net/ncsi: clear Tx enable mode when handling a Config required AEN 2023-05-17 11:50:16 +02:00
netfilter ipvs: align inner_mac_header for encapsulation 2023-06-28 10:29:48 +02:00
netlabel netlabel: fix out-of-bounds memory accesses 2022-04-13 20:59:10 +02:00
netlink net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report 2023-06-09 10:32:18 +02:00
netrom netrom: fix info-leak in nr_write_internal() 2023-06-09 10:32:16 +02:00
nfc nfc: change order inside nfc_se_io error path 2023-03-17 08:48:49 +01:00
nsh net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() 2023-05-24 17:36:51 +01:00
openvswitch net: openvswitch: fix possible memory leak in ovs_meter_cmd_set() 2023-02-22 12:57:09 +01:00
packet af_packet: do not use READ_ONCE() in packet_bind() 2023-06-09 10:32:17 +02:00
phonet
psample
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-20 12:13:53 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-03-10 09:39:16 +01:00
rfkill rfkill: make new event layout opt-in 2022-04-08 14:23:00 +02:00
rose net/rose: Fix to not accept on connected socket 2023-02-22 12:57:02 +01:00
rxrpc rxrpc: Fix hard call timeout units 2023-05-17 11:50:17 +02:00
sched net/sched: cls_api: Fix lockup on flushing explicitly created chain 2023-06-21 15:59:18 +02:00
sctp sctp: fix an error code in sctp_sf_eat_auth() 2023-06-21 15:59:17 +02:00
smc net/smc: Avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT 2023-06-14 11:13:01 +02:00
strparser
sunrpc SUNRPC: Fix trace_svc_register() call site 2023-05-24 17:36:51 +01:00
switchdev
tipc net: tipc: resize nlattr array to correct size 2023-06-21 15:59:18 +02:00
tls net: deal with most data-races in sk_wait_event() 2023-05-24 17:36:42 +01:00
unix af_unix: Fix data races around sk->sk_shutdown. 2023-05-24 17:36:42 +01:00
vmw_vsock vsock: avoid to close connected socket after the timeout 2023-05-24 17:36:49 +01:00
wireless wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid() 2023-06-21 15:59:14 +02:00
x25 net/x25: Fix to not accept on connected socket 2023-02-09 11:26:40 +01:00
xdp xsk: Fix unaligned descriptor validation 2023-05-11 23:00:27 +09:00
xfrm xfrm: Ensure policies always checked on XFRM-I input path 2023-06-28 10:29:45 +02:00
compat.c
devres.c
Kconfig Remove DECnet support from kernel 2023-06-21 15:59:15 +02:00
Makefile Remove DECnet support from kernel 2023-06-21 15:59:15 +02:00
socket.c net: annotate sk->sk_err write from do_recvmmsg() 2023-05-24 17:36:42 +01:00
sysctl_net.c