linux/net/ipv4
Martin KaFai Lau 2dbb9b9e6d bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT
This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY.  Like other
non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.

BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
to store the bpf context instead of using the skb->cb[48].

At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp).  At this
point,  it is not always clear where the bpf context can be appended
in the skb->cb[48] to avoid saving-and-restoring cb[].  Even putting
aside the difference between ipv4-vs-ipv6 and udp-vs-tcp.  It is not
clear if the lower layer is only ipv4 and ipv6 in the future and
will it not touch the cb[] again before transiting to the upper
layer.

For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
instead of IP[6]CB and it may still modify the cb[] after calling
the udp[46]_lib_lookup_skb().  Because of the above reason, if
sk->cb is used for the bpf ctx, saving-and-restoring is needed
and likely the whole 48 bytes cb[] has to be saved and restored.

Instead of saving, setting and restoring the cb[], this patch opts
to create a new "struct sk_reuseport_kern" and setting the needed
values in there.

The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
will serve all ipv4/ipv6 + udp/tcp combinations.  There is no protocol
specific usage at this point and it is also inline with the current
sock_reuseport.c implementation (i.e. no protocol specific requirement).

In "struct sk_reuseport_md", this patch exposes data/data_end/len
with semantic similar to other existing usages.  Together
with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
the bpf prog can peek anywhere in the skb.  The "bind_inany" tells
the bpf prog that the reuseport group is bind-ed to a local
INANY address which cannot be learned from skb.

The new "bind_inany" is added to "struct sock_reuseport" which will be
used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
to avoid repeating the "bind INANY" test on
"sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run.  It can
only be properly initialized when a "sk->sk_reuseport" enabled sk is
adding to a hashtable (i.e. during "reuseport_alloc()" and
"reuseport_add_sock()").

The new "sk_select_reuseport()" is the main helper that the
bpf prog will use to select a SO_REUSEPORT sk.  It is the only function
that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY.  As mentioned in
the earlier patch, the validity of a selected sk is checked in
run time in "sk_select_reuseport()".  Doing the check in
verification time is difficult and inflexible (consider the map-in-map
use case).  The runtime check is to compare the selected sk's reuseport_id
with the reuseport_id that we want.  This helper will return -EXXX if the
selected sk cannot serve the incoming request (e.g. reuseport_id
not match).  The bpf prog can decide if it wants to do SK_DROP as its
discretion.

When the bpf prog returns SK_PASS, the kernel will check if a
valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
If it does , it will use the selected sk.  If not, the kernel
will select one from "reuse->socks[]" (as before this patch).

The SK_DROP and SK_PASS handling logic will be in the next patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-11 01:58:46 +02:00
..
bpfilter bpfilter: remove trailing newline 2018-07-24 14:10:42 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2018-07-20 22:28:28 -07:00
af_inet.c net: ipv4: Control SKB reprioritization after forwarding 2018-08-01 09:52:30 -07:00
ah4.c net: use -ENOSPC for transient busy indication 2017-11-03 22:11:17 +08:00
arp.c proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
cipso_ipv4.c tcp/dccp: fix ireq->opt races 2017-10-21 01:33:19 +01:00
datagram.c
devinet.c route: add support for directed broadcast forwarding 2018-07-29 12:37:06 -07:00
esp4_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
esp4.c esp4: remove redundant initialization of pointer esph 2018-02-13 13:59:03 +01:00
fib_frontend.c ipv4: remove BUG_ON() from fib_compute_spec_dst 2018-07-28 19:06:12 -07:00
fib_lookup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_notifier.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_rules.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
fib_semantics.c net: metrics: add proper netlink validation 2018-06-05 12:29:43 -04:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
fou.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
gre_demux.c
gre_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
icmp.c ipv4: ipcm_cookie initializers 2018-07-07 10:58:49 +09:00
igmp.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
inet_connection_sock.c bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT 2018-08-11 01:58:46 +02:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c ip: use rb trees for IP frag queue. 2018-08-05 17:16:46 -07:00
inet_hashtables.c bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT 2018-08-11 01:58:46 +02:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c inetpeer: fix uninit-value in inet_getpeer 2018-04-09 10:57:35 -04:00
ip_forward.c net: ipv4: Control SKB reprioritization after forwarding 2018-08-01 09:52:30 -07:00
ip_fragment.c ipv4: frags: precedence bug in ip_expire() 2018-08-06 13:15:12 -07:00
ip_gre.c ip_gre: remove redundant variables t_hlen 2018-08-01 09:58:15 -07:00
ip_input.c net: ipv4: fix listify ip_rcv_finish in case of forwarding 2018-07-12 16:40:19 -07:00
ip_options.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip_output.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
ip_sockglue.c ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull 2018-07-24 16:35:58 -07:00
ip_tunnel_core.c net/ipv4: Update ip_tunnel_metadata_cnt static key to modern api 2018-05-10 15:13:33 -04:00
ip_tunnel.c ip_tunnel: Fix name string concatenate in __ip_tunnel_create() 2018-06-07 16:27:16 -04:00
ip_vti.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipcomp.c
ipconfig.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
ipip.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipmr_base.c rhashtable: split rhashtable.h 2018-06-22 13:43:27 +09:00
ipmr.c net: ipmr: add support for passing full packet on wrong vif 2018-07-13 14:21:16 -07:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
metrics.c net: metrics: add proper netlink validation 2018-06-05 12:29:43 -04:00
netfilter.c netfilter: utils: move nf_ip_checksum* from ipv4 to utils 2018-07-16 17:51:48 +02:00
netlink.c ipv4: support sport, dport and ip_proto in RTM_GETROUTE 2018-05-23 15:14:12 -04:00
ping.c net: add helpers checking if socket can be bound to nonlocal address 2018-08-01 09:50:04 -07:00
proc.c ip: discard IPv4 datagrams with overlapping segments. 2018-08-05 17:16:46 -07:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw_diag.c net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
raw.c ip: remove tx_flags from ipcm_cookie and use same logic for v4 and v6 2018-07-07 10:58:49 +09:00
route.c route: add support for directed broadcast forwarding 2018-07-29 12:37:06 -07:00
syncookies.c net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
sysctl_net_ipv4.c net: ipv4: Notify about changes to ip_forward_update_priority 2018-08-01 09:52:30 -07:00
tcp_bbr.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
tcp_bic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_cdg.c tcp: cdg: make struct tcp_cdg static 2017-10-16 21:24:25 +01:00
tcp_cong.c tcp: Namespace-ify sysctl_tcp_default_congestion_control 2017-11-15 14:09:52 +09:00
tcp_cubic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_dctcp.c tcp: do not delay ACK in DCTCP upon CE status change 2018-07-20 14:32:23 -07:00
tcp_diag.c net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store 2017-12-20 14:00:25 -05:00
tcp_fastopen.c tcp: pause Fast Open globally after third consecutive timeout 2017-12-13 15:51:12 -05:00
tcp_highspeed.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_htcp.c tcp: fix cwnd undo in Reno and HTCP congestion controls 2017-08-06 21:25:10 -07:00
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
tcp_ipv4.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux 2018-07-20 21:17:12 -07:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp_minisocks.c tcp: use monotonic timestamps for PAWS 2018-07-12 14:50:40 -07:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c tcp: Don't coalesce decrypted and encrypted SKBs 2018-07-16 00:12:09 -07:00
tcp_output.c tcp: remove set but not used variable 'skb_size' 2018-08-01 09:57:09 -07:00
tcp_rate.c tcp: expose both send and receive intervals for rate sample 2018-07-11 23:01:56 -07:00
tcp_recovery.c tcp: add stat of data packet reordering events 2018-08-01 09:56:10 -07:00
tcp_scalable.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_timer.c tcp: make function tcp_retransmit_stamp() static 2018-07-25 16:35:45 -07:00
tcp_ulp.c net: add a UID to use for ULP socket assignment 2018-02-06 11:39:31 +01:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-09-29 06:07:00 +01:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_westwood.c tcp: Revert "tcp: remove CA_ACK_SLOWPATH" 2017-08-30 11:20:08 -07:00
tcp_yeah.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp.c tcp: remove unneeded variable 'err' 2018-08-03 16:52:07 -07:00
tunnel4.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
udp_diag.c udp: fix rx queue len reported by diag and proc interface 2018-06-08 19:55:15 -04:00
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
udp_tunnel.c net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udp.c bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT 2018-08-11 01:58:46 +02:00
udplite.c proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
xfrm4_input.c xfrm: Reinject transport-mode packets through tasklet 2017-12-19 08:23:21 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c
xfrm4_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfrm4_tunnel.c