linux/net/core
Andrey Ignatov d74bad4e74 bpf: Hooks for sys_connect
== The problem ==

See description of the problem in the initial patch of this patch set.

== The solution ==

The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.

It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.

Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
  significantly;
* there is no use-case for port.

As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.

Support is added for IPv4 and IPv6, for TCP and UDP.

IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.

== Implementation notes ==

The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.

bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.

bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:54 +02:00
..
datagram.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
dev_addr_lists.c net: fix spelling for synchronized 2014-11-18 15:26:32 -05:00
dev_ioctl.c net: don't unnecessarily load kernel modules in dev_ioctl() 2018-03-07 15:12:58 -05:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
devlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
drop_monitor.c treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
dst_cache.c net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency 2018-03-05 12:52:45 -05:00
dst.c net: Remove dst->next 2017-11-30 09:54:27 -05:00
ethtool.c net: ethtool: extend RXNFC API to support RSS spreading of filter matches 2018-03-08 21:54:52 -05:00
fib_notifier.c net: Convert fib_* pernet_operations, registered via subsys_initcall 2018-02-13 10:36:07 -05:00
fib_rules.c net: fib_rules: support for match on ip_proto, sport and dport 2018-02-28 22:44:43 -05:00
filter.c bpf: Hooks for sys_connect 2018-03-31 02:15:54 +02:00
flow_dissector.c net: Remove unused get_hash_from_flow functions 2018-03-04 13:04:23 -05:00
gen_estimator.c net_sched: gen_estimator: fix broken estimators based on percpu stats 2018-02-23 12:35:46 -05:00
gen_stats.c net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq 2017-12-08 13:32:26 -05:00
gro_cells.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hwbm.c net: hwbm: Fix unbalanced spinlock in error case 2016-05-25 12:35:09 -07:00
link_watch.c net: link_watch: mark bonding link events urgent 2018-01-23 19:43:30 -05:00
lwt_bpf.c bpf: rename bpf_compute_data_end into bpf_compute_data_pointers 2017-09-26 13:36:44 -07:00
lwtunnel.c ipv6: sr: define core operations for seg6local lightweight tunnel 2017-08-07 14:16:22 -07:00
Makefile xdp: base API for new XDP rx-queue info concept 2018-01-05 15:21:20 -08:00
neighbour.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
net_namespace.c net: Replace ip_ra_lock with per-net mutex 2018-03-22 15:12:56 -04:00
net-procfs.c net: Convert pernet_subsys ops, registered via net_dev_init() 2018-02-13 10:36:07 -05:00
net-sysfs.c net: introduce helper dev_change_tx_queue_len() 2018-01-29 12:42:15 -05:00
net-sysfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
net-traces.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-04 09:26:51 +09:00
netclassid_cgroup.c cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS 2017-07-21 11:14:51 -04:00
netevent.c netevent: remove automatic variable in register_netevent_notifier() 2015-05-31 00:03:21 -07:00
netpoll.c netpoll: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:54 +01:00
netprio_cgroup.c net: remove duplicate includes 2017-12-13 13:18:46 -05:00
pktgen.c pktgen: Fix memory leak in pktgen_if_write 2018-03-14 10:02:15 -04:00
ptp_classifier.c ptp: Change ptp_class to a proper bitmask 2015-11-03 11:08:22 -05:00
request_sock.c ipv4: Namespaceify tcp_max_syn_backlog knob 2016-12-29 11:38:31 -05:00
rtnetlink.c net: Add rtnl_lock_killable() 2018-03-16 12:31:19 -04:00
scm.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> 2017-03-02 08:42:29 +01:00
secure_seq.c tcp: Namespaceify sysctl_tcp_timestamps 2017-06-08 10:53:29 -04:00
skbuff.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
sock_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
sock_reuseport.c soreuseport: fix mem leak in reuseport_add_sock() 2018-02-02 19:47:03 -05:00
sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
stream.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
sysctl_net_core.c net: do not create fallback tunnels for non-default namespaces 2018-03-09 11:23:11 -05:00
timestamping.c net: skb_defer_rx_timestamp should check for phydev before setting up classify 2015-07-09 14:17:15 -07:00
tso.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
utils.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00
xdp.c xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg 2018-01-05 15:21:21 -08:00