linux/net/ipv4
Ido Schimmel 1fa3314c14 ipv4: Centralize TOS matching
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.

What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).

Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:

 # ip -4 rule add tos 0x1c table 100
 # ip -4 rule add tos 0x3c table 100
 Error: Invalid tos.

While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.

Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.

A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.

However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.

This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.

Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.

Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.

Include <linux/ip.h> in <linux/in_route.h> since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
<linux/in_route.h>.

No regressions in FIB tests:

 # ./fib_tests.sh
 [...]
 Tests passed: 218
 Tests failed:   0

And FIB rule tests:

 # ./fib_rule_tests.sh
 [...]
 Tests passed: 116
 Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-20 14:57:08 +02:00
..
netfilter netfilter: nft_fib: Mask upper DSCP bits before FIB lookup 2024-08-20 14:57:07 +02:00
af_inet.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ah4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
arp.c arp: Convert ioctl(SIOCGARP) to RCU. 2024-05-01 18:37:07 -07:00
bpf_tcp_ca.c bpf: pass bpf_struct_ops_link to callbacks in bpf_struct_ops. 2024-05-30 15:34:13 -07:00
cipso_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
datagram.c ipv4: Set the routing scope properly in ip_route_output_ports(). 2024-02-12 17:33:05 -08:00
devinet.c ip: Move INFINITY_LIFE_TIME to addrconf.h. 2024-08-15 18:56:14 -07:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
esp4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-07-15 13:19:17 -07:00
fib_frontend.c ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family 2024-08-20 14:57:07 +02:00
fib_lookup.h
fib_notifier.c
fib_rules.c ipv4: Centralize TOS matching 2024-08-20 14:57:08 +02:00
fib_semantics.c ipv4: Centralize TOS matching 2024-08-20 14:57:08 +02:00
fib_trie.c ipv4: Centralize TOS matching 2024-08-20 14:57:08 +02:00
fou_bpf.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
fou_core.c fou: remove warn in gue_gro_receive on unsupported protocol 2024-06-17 17:49:50 -07:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
gre_offload.c net: gro: rename skb_gro_header_hard() 2024-03-05 13:30:11 +01:00
icmp.c net/ipv4: add tracepoint for icmp_send 2024-05-08 10:39:26 +01:00
igmp.c ipv4: Set scope explicitly in ip_route_output(). 2024-04-08 13:20:51 +01:00
inet_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-27 12:14:11 -07:00
inet_diag.c inet_diag: Initialize pad field in struct inet_diag_req_v2 2024-07-04 15:25:09 +02:00
inet_fragment.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
inet_hashtables.c inet: constify 'struct net' parameter of various lookup helpers 2024-08-05 16:22:45 -07:00
inet_timewait_sock.c tcp: move inet_twsk_schedule helper out of header 2024-06-10 11:54:18 +01:00
inetpeer.c net: ipv4: Simplify the allocation of slab caches in inet_initpeers 2024-01-31 16:39:42 -08:00
ip_forward.c net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check 2023-10-13 09:58:45 -07:00
ip_fragment.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
ip_gre.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip_input.c inet: introduce dst_rtable() helper 2024-04-30 18:32:38 -07:00
ip_options.c
ip_output.c ipv4: export ip_flush_pending_frames 2024-07-31 09:25:12 +01:00
ip_sockglue.c inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT 2024-03-06 12:37:06 +00:00
ip_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ip_tunnel.c ip_tunnel: Move stats allocation to core 2024-06-11 19:24:37 -07:00
ip_vti.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
ipmr_base.c
ipmr.c ip_tunnel: use a separate struct to store tunnel params in the kernel 2024-04-01 10:49:28 +01:00
Kconfig net/tcp: Expand goo.gl link 2024-07-30 18:35:12 -07:00
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
metrics.c net: remove NULL-pointer net parameter in ip_metrics_convert 2024-06-05 10:06:00 +01:00
netfilter.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
netlink.c
nexthop.c net: nexthop: Increase weight to u16 2024-08-12 17:50:34 -07:00
ping.c ping: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
proc.c minmax: add a few more MIN_T/MAX_T users 2024-07-28 13:41:14 -07:00
protocol.c
raw_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
raw.c net: raw: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
route.c A lot of networking people were at a conference last week, busy 2024-07-25 13:32:25 -07:00
syncookies.c tcp: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
sysctl_net_ipv4.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
tcp_ao.c net/tcp: Disable TCP-AO static key after RCU grace period 2024-08-04 13:21:50 +01:00
tcp_bbr.c tcp: Add new args for cong_control in tcp_congestion_ops 2024-05-02 16:26:56 -07:00
tcp_bic.c
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-18 18:09:31 -07:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c tcp: Replace strncpy() with strscpy() 2024-07-16 07:52:15 -07:00
tcp_cubic.c bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs 2024-03-28 18:31:40 -07:00
tcp_dctcp.c tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 2024-05-21 13:34:50 +02:00
tcp_dctcp.h
tcp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
tcp_fastopen.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
tcp_highspeed.c
tcp_htcp.c tcp: Use clamp() in htcp_alpha_update() 2024-08-06 12:16:25 -07:00
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: Update window clamping condition 2024-08-14 10:50:49 +01:00
tcp_ipv4.c net/ipv4: Use nested-BH locking for ipv4_tcp_sk. 2024-06-24 16:41:22 -07:00
tcp_lp.c tcp: rename tcp_time_stamp() to tcp_time_stamp_ts() 2023-10-23 09:35:01 +01:00
tcp_metrics.c tcp_metrics: use netlink policy for IPv6 addr len validation 2024-08-19 17:42:57 -07:00
tcp_minisocks.c tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). 2024-07-16 11:56:14 +02:00
tcp_nv.c
tcp_offload.c net: drop bad gso csum_start and offset in virtio_net_hdr 2024-07-30 18:34:13 -07:00
tcp_output.c tcp: rstreason: let it work finally in tcp_send_active_reset() 2024-08-07 10:24:46 +01:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-17 17:25:42 -07:00
tcp_scalable.c
tcp_sigpool.c net/tcp_sigpool: Use nested-BH locking for sigpool_scratch. 2024-06-24 16:41:22 -07:00
tcp_timer.c tcp: rstreason: introduce SK_RST_REASON_TCP_KEEPALIVE_TIMEOUT for active reset 2024-08-07 10:24:45 +01:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: rstreason: introduce SK_RST_REASON_TCP_DISCONNECT_WITH_DATA for active reset 2024-08-07 10:24:46 +01:00
tunnel4.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c inet_diag: add module pointer to "struct inet_diag_handler" 2024-01-23 15:13:54 +01:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c udp: Fall back to software USO if IPv6 extension headers are present 2024-08-09 21:58:08 -07:00
udp_tunnel_core.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
udp_tunnel_nic.c udp_tunnel: Use flex array to simplify code 2023-10-03 11:39:34 +02:00
udp_tunnel_stub.c
udp.c udp: constify 'struct net' parameter of socket lookups 2024-08-05 16:23:59 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm4_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm4_output.c
xfrm4_policy.c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array 2024-05-03 13:29:42 +01:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv4 modules 2024-02-09 14:12:02 -08:00