2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 12:14:01 +08:00
linux-next/net
Daniel Borkmann f318903c0b bpf: Add netns cookie and enable it for bpf cgroup hooks
In Cilium we're mainly using BPF cgroup hooks today in order to implement
kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
between Cilium managed nodes. While this works in its current shape and avoids
packet-level NAT for inter Cilium managed node traffic, there is one major
limitation we're facing today, that is, lack of netns awareness.

In Kubernetes, the concept of Pods (which hold one or multiple containers)
has been built around network namespaces, so while we can use the global scope
of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
NodePort ports on loopback addresses), we also have the need to differentiate
between initial network namespaces and non-initial one. For example, ExternalIP
services mandate that non-local service IPs are not to be translated from the
host (initial) network namespace as one example. Right now, we have an ugly
work-around in place where non-local service IPs for ExternalIP services are
not xlated from connect() and friends BPF hooks but instead via less efficient
packet-level NAT on the veth tc ingress hook for Pod traffic.

On top of determining whether we're in initial or non-initial network namespace
we also have a need for a socket-cookie like mechanism for network namespaces
scope. Socket cookies have the nice property that they can be combined as part
of the key structure e.g. for BPF LRU maps without having to worry that the
cookie could be recycled. We are planning to use this for our sessionAffinity
implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
provide the cookie for the initial network namespace while passing the context
instead of NULL would provide the cookie from the application's network namespace.
We're using a hole, so no size increase; the assignment happens only once.
Therefore this allows for a comparison on initial namespace as well as regular
cookie usage as we have today with socket cookies. We could later on enable
this helper for other program types as well as we would see need.

  (*) Both externalTrafficPolicy={Local|Cluster} types
  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
2020-03-27 19:40:38 -07:00
..
6lowpan
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
802 net: 802: psnap.c: Use built-in RCU list checking 2020-02-24 13:02:53 -08:00
8021q net: vlan: suppress "failed to kill vid" warnings 2020-02-17 14:30:54 -08:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ax25 net: Make sock protocol value checks more specific 2020-01-09 18:41:40 -08:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
bluetooth Bluetooth: Fix race condition in hci_release_sock() 2020-01-26 10:34:17 +02:00
bpf bpf: Add selftests for BPF_MODIFY_RETURN 2020-03-04 13:41:06 -08:00
bpfilter kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
bridge net: bridge: fix stale eth hdr pointer in br_dev_xmit 2020-02-24 11:11:19 -08:00
caif net: caif: Add lockdep expression to RCU traversal primitive 2020-03-11 22:55:25 -07:00
can can: j1939: j1939_sk_bind(): take priv after lock is held 2019-12-08 11:52:02 +01:00
ceph Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
core bpf: Add netns cookie and enable it for bpf cgroup hooks 2020-03-27 19:40:38 -07:00
dcb
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-02-29 15:53:35 -08:00
decnet net: Make sock protocol value checks more specific 2020-01-09 18:41:40 -08:00
dns_resolver
dsa net: dsa: warn if phylink_mac_link_state returns error 2020-03-15 17:11:12 -07:00
ethernet net: remove eth_change_mtu 2020-01-27 11:09:31 +01:00
ethtool ethtool: fix spelling mistake "exceeeds" -> "exceeds" 2020-03-13 11:23:33 -07:00
hsr hsr: fix refcnt leak of hsr slave interface 2020-03-05 11:59:47 -08:00
ieee802154 nl802154: add missing attribute validation for dev_type 2020-03-03 13:28:48 -08:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 bpf: Add bpf_sk_storage support to bpf_tcp_ca 2020-03-23 20:51:55 +01:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
iucv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kcm net: kcm: kcmproc.c: Fix RCU list suspicious usage warning 2020-03-16 17:14:02 -07:00
key
l2tp l2tp: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
l3mdev
lapb
llc af_llc: fix if-statement empty body warning 2020-02-26 20:38:13 -08:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
mac802154
mpls net: mpls: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
mptcp mptcp: drop unneeded checks 2020-03-15 00:19:03 -07:00
ncsi net/ncsi: Support for multi host mellanox card 2020-01-09 18:36:22 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
netlabel netlabel_domainhash.c: Use built-in RCU list checking 2020-02-18 12:44:23 -08:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
netrom net: netrom: Add missing annotation for nr_neigh_stop() 2020-02-24 13:26:49 -08:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
packet net/packet: tpacket_rcv: do not increment ring index on drop 2020-03-11 23:12:16 -07:00
phonet net: Remove redundant BUG_ON() check in phonet_pernet 2020-01-03 12:25:50 -08:00
psample net: psample: fix skb_over_panic 2019-11-26 14:40:13 -08:00
qrtr net: qrtr: Fix FIXME related to qrtr_ns_init() 2020-03-03 17:52:21 -08:00
rds net/rds: Track user mapped pages through special API 2020-02-16 18:37:09 -08:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2019-12-16 10:15:49 +01:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-01-26 10:40:21 +01:00
rxrpc rxrpc: Fix call RCU cleanup using non-bh-safe locks 2020-02-07 11:20:57 +01:00
sched net: sched: set the hw_stats_type in pedit loop 2020-03-16 02:13:43 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
strparser
sunrpc xprtrdma: Fix DMA scatter-gather list mapping imbalance 2020-02-13 15:35:33 -05:00
switchdev net: switchdev: do not propagate bridge updates across bridges 2020-02-26 20:58:33 -08:00
tipc tipc: add NULL pointer check to prevent kernel oops 2020-03-15 00:07:00 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-02-21 15:22:45 -08:00
unix net: datagram: drop 'destructor' argument from several helpers 2020-02-28 12:12:53 -08:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-02-27 18:31:39 -08:00
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
x25 net: x25: convert to list_for_each_entry_safe() 2020-02-16 18:59:42 -08:00
xdp xdp: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
xfrm net: datagram: drop 'destructor' argument from several helpers 2020-02-28 12:12:53 -08:00
compat.c y2038: socket: use __kernel_old_timespec instead of timespec 2019-11-15 14:38:29 +01:00
Kconfig net: disable BRIDGE_NETFILTER by default 2020-02-20 15:02:02 -08:00
Makefile mptcp: Add MPTCP socket stubs 2020-01-24 13:44:07 +01:00
socket.c socket: fix unused-function warning 2020-01-08 15:02:21 -08:00
sysctl_net.c