linux/net/ipv4
Andrey Ignatov d74bad4e74 bpf: Hooks for sys_connect
== The problem ==

See description of the problem in the initial patch of this patch set.

== The solution ==

The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.

It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.

Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
  significantly;
* there is no use-case for port.

As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.

Support is added for IPv4 and IPv6, for TCP and UDP.

IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.

== Implementation notes ==

The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.

bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.

bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:54 +02:00
..
netfilter net: Convert ipv4_net_ops 2018-03-08 12:36:45 -05:00
af_inet.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
ah4.c net: use -ENOSPC for transient busy indication 2017-11-03 22:11:17 +08:00
arp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
cipso_ipv4.c tcp/dccp: fix ireq->opt races 2017-10-21 01:33:19 +01:00
datagram.c
devinet.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-23 13:51:56 -05:00
esp4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
fib_frontend.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
fib_lookup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_notifier.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_rules.c ipv6: route: dissect flow in input path if fib rules need it 2018-02-28 22:44:44 -05:00
fib_semantics.c net/ipv4: Pass net to fib_multipath_hash instead of fib_info 2018-03-04 13:04:21 -05:00
fib_trie.c net: make kmem caches as __ro_after_init 2018-02-26 15:11:48 -05:00
fou.c net: Convert fou_net_ops 2018-03-05 10:48:28 -05:00
gre_demux.c
gre_offload.c gso: fix payload length when gso_size is zero 2017-10-08 10:12:15 -07:00
icmp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
igmp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
inet_connection_sock.c Revert "defer call to mem_cgroup_sk_alloc()" 2018-02-02 19:49:31 -05:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c net: Fix hlist corruptions in inet_evict_bucket() 2018-03-07 13:29:28 -05:00
inet_hashtables.c inet: Avoid unitialized variable warning in inet_unhash() 2018-02-01 09:48:42 -05:00
inet_timewait_sock.c net: Convert atomic_t net::count to refcount_t 2018-01-15 14:23:42 -05:00
inetpeer.c net: make kmem caches as __ro_after_init 2018-02-26 15:11:48 -05:00
ip_forward.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_fragment.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
ip_gre.c gre: fix TUNNEL_SEQ bit check on sequence numbering 2018-03-22 14:52:43 -04:00
ip_input.c net: Make ip_ra_chain per struct net 2018-03-22 15:12:56 -04:00
ip_options.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip_output.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_sockglue.c net: Replace ip_ra_lock with per-net mutex 2018-03-22 15:12:56 -04:00
ip_tunnel_core.c net: store port/representator id in metadata_dst 2017-06-25 11:42:01 -04:00
ip_tunnel.c net: do not create fallback tunnels for non-default namespaces 2018-03-09 11:23:11 -05:00
ip_vti.c net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops 2018-02-27 11:01:37 -05:00
ipcomp.c
ipconfig.c ipconfig: use dev_set_mtu() 2018-01-24 19:13:45 -05:00
ipip.c net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops 2018-02-27 11:01:37 -05:00
ipmr_base.c ipmr, ip6mr: Unite dumproute flows 2018-03-01 13:13:23 -05:00
ipmr.c net: Revert "ipv4: fix a deadlock in ip_ra_control" 2018-03-22 15:12:56 -04:00
Kconfig ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
Makefile ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
netfilter.c netfilter: remove struct nf_afinfo and its helper functions 2018-01-08 18:11:02 +01:00
ping.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
proc.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw_diag.c net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
raw.c net: Revert "ipv4: fix a deadlock in ip_ra_control" 2018-03-22 15:12:56 -04:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
syncookies.c tcp: Namespace-ify sysctl_tcp_workaround_signed_windows 2017-10-28 19:24:38 +09:00
sysctl_net_ipv4.c udp: Move the udp sysctl to namespace. 2018-03-16 12:03:30 -04:00
tcp_bbr.c net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit 2018-03-16 15:07:48 -04:00
tcp_bic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_cdg.c tcp: cdg: make struct tcp_cdg static 2017-10-16 21:24:25 +01:00
tcp_cong.c tcp: Namespace-ify sysctl_tcp_default_congestion_control 2017-11-15 14:09:52 +09:00
tcp_cubic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store 2017-12-20 14:00:25 -05:00
tcp_fastopen.c tcp: pause Fast Open globally after third consecutive timeout 2017-12-13 15:51:12 -05:00
tcp_highspeed.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_htcp.c tcp: fix cwnd undo in Reno and HTCP congestion controls 2017-08-06 21:25:10 -07:00
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-06 01:20:46 -05:00
tcp_ipv4.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
tcp_minisocks.c tcp: try to keep packet if SYN_RCV race is lost 2018-02-14 14:21:45 -05:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
tcp_output.c tcp_bbr: better deal with suboptimal GSO (II) 2018-03-01 21:44:28 -05:00
tcp_rate.c tcp: invalidate rate samples during SACK reneging 2017-12-08 10:07:02 -05:00
tcp_recovery.c tcp: evaluate packet losses upon RTT change 2017-12-08 14:14:11 -05:00
tcp_scalable.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_timer.c tcp: purge write queue upon aborting the connection 2018-03-07 15:01:03 -05:00
tcp_ulp.c net: add a UID to use for ULP socket assignment 2018-02-06 11:39:31 +01:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-09-29 06:07:00 +01:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_westwood.c tcp: Revert "tcp: remove CA_ACK_SLOWPATH" 2017-08-30 11:20:08 -07:00
tcp_yeah.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp.c bpf: sockmap redirect ingress support 2018-03-30 00:09:43 +02:00
tunnel4.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
udp_diag.c net: ipv6: add second dif to udp socket lookups 2017-08-07 11:39:22 -07:00
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
udp_tunnel.c net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udp.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
udplite.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
xfrm4_input.c xfrm: Reinject transport-mode packets through tasklet 2017-12-19 08:23:21 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfrm4_tunnel.c