linux/include/net
Daniel Borkmann f318903c0b bpf: Add netns cookie and enable it for bpf cgroup hooks
In Cilium we're mainly using BPF cgroup hooks today in order to implement
kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
between Cilium managed nodes. While this works in its current shape and avoids
packet-level NAT for inter Cilium managed node traffic, there is one major
limitation we're facing today, that is, lack of netns awareness.

In Kubernetes, the concept of Pods (which hold one or multiple containers)
has been built around network namespaces, so while we can use the global scope
of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
NodePort ports on loopback addresses), we also have the need to differentiate
between initial network namespaces and non-initial one. For example, ExternalIP
services mandate that non-local service IPs are not to be translated from the
host (initial) network namespace as one example. Right now, we have an ugly
work-around in place where non-local service IPs for ExternalIP services are
not xlated from connect() and friends BPF hooks but instead via less efficient
packet-level NAT on the veth tc ingress hook for Pod traffic.

On top of determining whether we're in initial or non-initial network namespace
we also have a need for a socket-cookie like mechanism for network namespaces
scope. Socket cookies have the nice property that they can be combined as part
of the key structure e.g. for BPF LRU maps without having to worry that the
cookie could be recycled. We are planning to use this for our sessionAffinity
implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
provide the cookie for the initial network namespace while passing the context
instead of NULL would provide the cookie from the application's network namespace.
We're using a hole, so no size increase; the assignment happens only once.
Therefore this allows for a comparison on initial namespace as well as regular
cookie usage as we have today with socket cookies. We could later on enable
this helper for other program types as well as we would see need.

  (*) Both externalTrafficPolicy={Local|Cluster} types
  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
2020-03-27 19:40:38 -07:00
..
9p treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 188 2019-05-30 11:29:21 -07:00
bluetooth Bluetooth: monitor: Add support for ISO packets 2020-01-15 22:28:51 +01:00
caif treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 194 2019-05-30 11:29:22 -07:00
iucv net/af_iucv: locate IUCV header via skb_network_header() 2018-09-26 09:56:07 -07:00
netfilter net/sched: act_ct: Support refreshing the flow table entries 2020-03-12 15:00:39 -07:00
netns tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted. 2020-03-12 12:08:09 -07:00
nfc NFC: Replace zero-length array with flexible-array member 2020-02-27 12:06:20 -08:00
phonet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
sctp net: sctp: Replace zero-length array with flexible-array member 2020-02-29 21:52:19 -08:00
tc_act net/sched: act_ct: Enable hardware offload of flow table entires 2020-03-12 15:00:39 -07:00
6lowpan.h
act_api.h sched: act: allow user to specify type of HW stats for a filter 2020-03-08 21:07:48 -07:00
addrconf.h ipv6: Annotate ipv6_addr_is_* bitwise pointer casts 2019-12-16 16:09:44 -08:00
af_ieee802154.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
af_rxrpc.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
af_unix.h unix: uses an atomic type for scm files accounting 2020-02-28 12:12:53 -08:00
af_vsock.h vsock: add local transport support in the vsock core 2019-12-11 15:01:23 -08:00
ah.h
arp.h net: avoid potential false sharing in neighbor related code 2019-11-06 16:14:48 -08:00
atmclip.h
ax25.h ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
ax88796.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
bareudp.h net: Special handling for IP & MPLS. 2020-02-24 13:31:42 -08:00
bond_3ad.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 90 2019-05-24 17:37:53 +02:00
bond_alb.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 2019-05-21 11:28:40 +02:00
bond_options.h bonding: add an option to specify a delay between peer notifications 2019-07-04 12:30:48 -07:00
bonding.h bonding: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
bpf_sk_storage.h bpf: INET_DIAG support in bpf_sk_storage 2020-02-27 18:50:19 -08:00
busy_poll.h net: annotate lockless accesses to sk->sk_napi_id 2019-10-30 17:34:35 -07:00
calipso.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
cfg80211-wext.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
cfg80211.h nl80211: Add support to configure TID specific RTSCTS configuration 2020-02-24 13:56:57 +01:00
cfg802154.h cfg802154: Replace zero-length array with flexible-array member 2020-02-29 14:39:08 +01:00
checksum.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cipso_ipv4.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
cls_cgroup.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
codel_impl.h
codel_qdisc.h
codel.h
compat.h net: rework SIOCGSTAMP ioctl handling 2019-04-19 14:07:40 -07:00
datalink.h
dcbevent.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
dcbnl.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
devlink.h devlink: extend devlink_trap_report() to accept cookie and pass 2020-02-25 11:05:54 -08:00
dn_dev.h
dn_fib.h net: dn_fib: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
dn_neigh.h
dn_nsp.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 2019-05-21 11:52:39 +02:00
dn_route.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 2019-05-21 11:52:39 +02:00
dn.h
drop_monitor.h drop_monitor: extend by passing cookie from driver 2020-02-25 11:05:54 -08:00
dsa.h net: dsa: Add bypass operations for the flower classifier-action filter 2020-03-03 18:57:49 -08:00
dsfield.h ipv6: Annotate bitwise IPv6 dsfield pointer cast 2019-12-16 16:09:44 -08:00
dst_cache.h
dst_metadata.h
dst_ops.h net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
dst.h net/dst: do not confirm neighbor for vxlan and geneve pmtu update 2019-12-24 22:28:55 -08:00
erspan.h
esp.h
espintcp.h xfrm: add espintcp (RFC 8229) 2019-12-09 09:59:07 +01:00
ethoc.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
failover.h net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_notifier.h ipv6: Remove old route notifications and convert listeners 2019-12-24 22:37:30 -08:00
fib_rules.h fib: add missing attribute validation for tun_id 2020-03-03 13:28:48 -08:00
firewire.h
flow_dissector.h net: sched: correct flower port blocking 2020-02-17 21:33:28 -08:00
flow_offload.h flow_offload: Add flow_match_ct to get rule ct match 2020-03-12 15:00:39 -07:00
flow.h route: Add multipath_hash in flowi_common to make user-define hash 2019-02-27 12:50:17 -08:00
fou.h
fq_impl.h net/fq_impl: Switch to kvmalloc() for memory allocation 2019-11-08 09:11:49 +01:00
fq.h net/flow_dissector: switch to siphash 2019-10-23 20:13:22 -07:00
garp.h treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
gen_stats.h net_sched: extend packet counter to 64bit 2019-11-05 18:20:55 -08:00
genetlink.h net: genetlink: remove unused genl_family_attrbuf() 2019-10-06 15:44:47 +02:00
geneve.h net: Move the definition of the default Geneve udp port to public header file 2019-03-22 12:09:31 -07:00
gre.h net: Add netif_is_gretap()/netif_is_ip6gretap() 2018-12-10 15:53:04 -08:00
gro_cells.h
gtp.h
gue.h net:gue.h:Fix shifting signed 32-bit value by 31 bits problem 2019-07-01 10:58:23 -07:00
hwbm.h net: hwbm: if CONFIG_NET_HWBM unset, make stub functions static 2019-10-25 16:24:32 -07:00
icmp.h icmp: introduce helper for nat'd source address in network device context 2020-02-13 14:19:00 -08:00
ieee80211_radiotap.h wireless-drivers-next patches for 5.1 2019-02-22 12:56:24 -08:00
ieee802154_netdev.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
if_inet6.h ipv6: shrink struct ipv6_mc_socklist 2019-08-28 14:43:03 -07:00
ife.h net: ife: drop include of module.h from net/ife.h 2019-04-22 21:50:53 -07:00
ila.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
inet6_connection_sock.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
inet6_hashtables.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
inet_common.h inet: factor out inet_send_prepare() 2019-07-03 13:51:54 -07:00
inet_connection_sock.h bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
inet_ecn.h inet: Refactor INET_ECN_decapsulate() 2018-10-17 17:45:07 -07:00
inet_frag.h inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
inet_hashtables.h tcp/dccp: fix possible race __inet_lookup_established() 2019-12-13 21:40:49 -08:00
inet_sock.h net: inet_sock: Replace zero-length array with flexible-array member 2020-03-02 11:16:28 -08:00
inet_timewait_sock.h tcp: honor SO_PRIORITY in TIME_WAIT state 2019-09-27 12:05:02 +02:00
inetpeer.h net: ipv4: use a dedicated counter for icmp_v4 redirect packets 2019-02-08 21:50:15 -08:00
ip6_checksum.h net: core: add helper tcp_v6_gso_csum_prep 2020-02-19 11:20:59 -08:00
ip6_fib.h net: ip6_fib: Replace zero-length array with flexible-array member 2020-03-02 11:16:28 -08:00
ip6_route.h net: ip6_route: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
ip6_tunnel.h ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL 2019-06-18 20:48:45 -04:00
ip_fib.h net: ip_fib: Replace zero-length array with flexible-array member 2020-03-02 11:16:28 -08:00
ip_tunnels.h treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
ip_vs.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-02 13:54:56 -07:00
ip.h inet: protect against too small mtu values. 2019-12-07 11:55:11 -08:00
ipcomp.h
ipconfig.h
ipv6_frag.h inet: fix various use-after-free in defrags units 2019-06-19 11:37:47 -04:00
ipv6_stubs.h net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
ipv6.h net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc. 2020-02-24 13:31:42 -08:00
ipx.h bonding/alb: properly access headers in bond_alb_xmit() 2020-02-05 14:28:09 +01:00
iw_handler.h
kcm.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
l3mdev.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
lag.h net: Add lag.h, net_lag_port_dev_txable() 2018-07-11 23:10:19 -07:00
lapb.h
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h llc: fix sk_buff leak in llc_conn_service() 2019-10-08 13:23:05 -07:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h llc: avoid blocking in llc_sap_close() 2018-09-13 09:04:58 -07:00
lwtunnel.h net: lwtunnel: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
mac80211.h mac80211: Add api to support configuring TID specific configuration 2020-02-24 14:07:01 +01:00
mac802154.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
macsec.h macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw) 2020-03-16 01:42:31 -07:00
mip6.h net: mip6: Replace zero-length array with flexible-array member 2020-03-02 11:16:27 -08:00
mld.h net: ipv6: mld: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
mpls_iptunnel.h net: mpls: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
mpls.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
mptcp.h mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6 2020-01-30 10:55:54 +01:00
mrp.h treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
ncsi.h
ndisc.h ndisc: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
neighbour.h neighbour: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
net_failover.h net: Introduce net_failover driver 2018-05-28 22:59:54 -04:00
net_namespace.h bpf: Add netns cookie and enable it for bpf cgroup hooks 2020-03-27 19:40:38 -07:00
net_ratelimit.h
netevent.h net: ipv4: Notify about changes to ip_forward_update_priority 2018-08-01 09:52:30 -07:00
netlabel.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
netlink.h netlink: rename nl80211_validate_nested() to nla_validate_nested() 2019-12-12 17:07:05 -08:00
netprio_cgroup.h netprio: use css ID instead of cgroup ID 2019-11-12 08:18:03 -08:00
netrom.h net: netrom: Fix error cleanup path of nr_proto_init 2019-04-11 13:59:49 -07:00
nexthop.h net: nexthop: Replace zero-length array with flexible-array member 2020-02-29 21:52:20 -08:00
nl802154.h
nsh.h
p8022.h
page_pool.h net: page_pool: API cleanup and comments 2020-02-20 10:09:25 -08:00
pie.h pie: realign comment 2020-03-04 13:25:55 -08:00
ping.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
pkt_cls.h net: sched: RED: Introduce an ECN nodrop mode 2020-03-14 21:03:46 -07:00
pkt_sched.h net: sched: Replace zero-length array with flexible-array member 2020-02-29 21:27:02 -08:00
pptp.h
protocol.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
psample.h net: sched: take reference to psample group in flow_action infra 2019-09-16 09:18:03 +02:00
psnap.h
raw.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
rawv6.h
red.h net: sched: RED: Introduce an ECN nodrop mode 2020-03-14 21:03:46 -07:00
regulatory.h cfg80211: make wmm_rule part of the reg_rule structure 2018-08-28 11:11:47 +02:00
request_sock.h net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head 2019-10-09 21:34:31 -07:00
rose.h
route.h net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc. 2020-02-24 13:31:42 -08:00
rsi_91x.h
rtnetlink.h net: Add extack argument to rtnl_create_link 2018-11-06 15:00:45 -08:00
rtnh.h net: Rename net/nexthop.h net/rtnh.h 2019-04-22 21:47:25 -07:00
sch_generic.h Revert "net: sched: make newly activated qdiscs visible" 2020-03-12 11:19:24 -07:00
scm.h pids: Compute task_tgid using signal->leader_pid 2018-07-21 10:43:12 -05:00
secure_seq.h
seg6_hmac.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
seg6_local.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
seg6.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
slhc_vj.h
smc.h net/smc: introduce bookkeeping of SMCD link groups 2019-11-15 12:28:28 -08:00
snmp.h net/tls: add skeleton of MIB statistics 2019-10-05 16:29:00 -07:00
sock_reuseport.h net: sock_reuseport: Replace zero-length array with flexible-array member 2020-02-29 21:52:19 -08:00
sock.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-02-21 15:22:45 -08:00
Space.h
stp.h
strparser.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
switchdev.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_states.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp.h bpf, tcp: Make tcp_bpf_recvmsg static 2020-03-20 15:56:55 +01:00
timewait_sock.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tipc.h
tls_toe.h net/tls: rename tls_hw_* functions tls_toe_* 2019-10-04 14:07:07 -07:00
tls.h net/tls: add helper for testing if socket is RX offloaded 2019-12-19 17:46:51 -08:00
transp_v6.h ipv6: fold sockcm_cookie into ipcm6_cookie 2018-07-07 10:58:49 +09:00
tso.h
tun_proto.h
udp_tunnel.h ipv6: Move ipv6 stubs to a separate header file 2019-03-29 10:53:45 -07:00
udp.h bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
udplite.h
vsock_addr.h vsock: remove include/linux/vm_sockets.h file 2019-11-14 18:12:17 -08:00
vxlan.h vxlan: add adjacent link to limit depth level 2019-10-24 14:53:49 -07:00
wext.h
wimax.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 268 2019-06-05 17:30:29 +02:00
x25.h net/x25: add new state X25_STATE_5 2019-12-09 10:28:43 -08:00
x25device.h
xdp_priv.h page_pool: do not release pool until inflight == 0. 2019-11-16 12:39:10 -08:00
xdp_sock.h xsk: ixgbe: i40e: ice: mlx5: Xsk_umem_discard_addr to xsk_umem_release_addr 2019-12-20 16:00:09 -08:00
xdp.h xdp: page_pool related fix to cpumap 2019-06-19 11:23:13 -04:00
xfrm.h xfrm: add espintcp (RFC 8229) 2019-12-09 09:59:07 +01:00