linux/net/ipv4
David Ahern a6db4494d2 net: ipv4: Consider failed nexthops in multipath routes
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:16:13 -04:00
..
netfilter netfilter: ipv4: fix NULL dereference 2016-03-28 17:59:29 +02:00
af_inet.c net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
ah4.c ah4: Fix error return in ah_input(). 2015-08-25 13:38:50 -07:00
arp.c arp: correct return value of arp_rcv 2016-03-07 14:55:04 -05:00
cipso_ipv4.c net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c ipv4: Don't do expensive useless work during inetdev destroy. 2016-03-13 23:28:35 -04:00
esp4.c esp4: Switch to new AEAD interface 2015-05-28 11:23:20 +08:00
fib_frontend.c ipv4: initialize flowi4_flags before calling fib_lookup() 2016-03-22 15:59:23 -04:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
fib_semantics.c net: ipv4: Consider failed nexthops in multipath routes 2016-04-11 15:16:13 -04:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-01 15:56:08 -08:00
fou.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
gre_demux.c gre: Remove support for sharing GRE protocol hook. 2015-08-10 14:03:54 -07:00
gre_offload.c GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU 2016-04-07 16:56:33 -04:00
icmp.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
igmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
inet_connection_sock.c tcp/dccp: fix inet_reuseport_add_sock() 2016-04-07 12:02:33 -04:00
inet_diag.c tcp/dccp: do not touch listener sk_refcnt under synflood 2016-04-04 22:11:20 -04:00
inet_fragment.c inet: kill unused skb_free op 2016-01-05 22:25:57 -05:00
inet_hashtables.c tcp/dccp: fix inet_reuseport_add_sock() 2016-04-07 12:02:33 -04:00
inet_timewait_sock.c tcp/dccp: fix timewait races in timer handling 2015-09-21 16:32:29 -07:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c net: remove skb_sender_cpu_clear() 2016-03-01 17:36:47 -05:00
ip_fragment.c net: Export ip fragment sysctl to unprivileged users 2016-02-16 20:42:54 -05:00
ip_gre.c GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU 2016-04-07 16:56:33 -04:00
ip_input.c ipv4: namespacify ip_early_demux sysctl knob 2016-02-16 20:42:54 -05:00
ip_options.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
ip_sockglue.c net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
ip_tunnel_core.c ip_tunnel: implement __iptunnel_pull_header 2016-04-06 16:50:32 -04:00
ip_tunnel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
ip_vti.c ip_tunnel: Move stats update to iptunnel_xmit() 2015-12-25 23:32:23 -05:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c ipv4: ipconfig: avoid unused ic_proto_used symbol 2016-01-29 19:39:09 -08:00
ipip.c iptunnel: scrub packet in iptunnel_pull_header 2016-02-18 14:34:54 -05:00
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
Kconfig ipv4: Remove inet_lro library 2016-02-17 16:15:46 -05:00
Makefile ipv4: Remove inet_lro library 2016-02-17 16:15:46 -05:00
netfilter.c ipv4: Pass struct net into ip_route_me_harder 2015-09-29 20:21:32 +02:00
ping.c sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
proc.c ipv4: Namespaceify ip_default_ttl sysctl knob 2016-02-16 20:42:54 -05:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
route.c route: check and remove route cache when we get route 2016-02-18 11:31:36 -05:00
syncookies.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
sysctl_net_ipv4.c net: ipv4: Consider failed nexthops in multipath routes 2016-04-11 15:16:13 -04:00
tcp_bic.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_cdg.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_cong.c tcp: remove tcp_ecn_make_synack() socket argument 2015-09-25 13:00:38 -07:00
tcp_cubic.c tcp_cubic: do not set epoch_start in the future 2015-09-17 22:35:07 -07:00
tcp_dctcp.c tcp: allow dctcp alpha to drop to zero 2015-10-23 02:46:52 -07:00
tcp_diag.c net: diag: Support destroying TCP sockets. 2015-12-15 23:26:52 -05:00
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_input.c tcp: increment sk_drops for listeners 2016-04-04 22:11:20 -04:00
tcp_ipv4.c net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
tcp_lp.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_metrics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
tcp_minisocks.c tcp: rate limit ACK sent by SYN_RECV request sockets 2016-04-04 22:11:20 -04:00
tcp_offload.c net: Store checksum result for offloaded GSO checksums 2016-02-11 08:55:33 -05:00
tcp_output.c tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In 2016-03-14 14:55:26 -04:00
tcp_probe.c net: ipv4: tcp_probe: Replace timespec with timespec64 2016-03-01 17:18:44 -05:00
tcp_recovery.c tcp: use RACK to detect losses 2015-10-21 07:00:53 -07:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c ipv4: Namespaceify tcp_orphan_retries sysctl knob 2016-02-07 14:35:11 -05:00
tcp_vegas.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_vegas.h tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
tcp_veno.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_westwood.c tcp_westwood: fix tcp_westwood_info() 2015-05-05 19:50:09 -04:00
tcp_yeah.c tcp_yeah: don't set ssthresh below 2 2016-01-11 17:25:16 -05:00
tcp.c sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
tunnel4.c
udp_diag.c udp: no longer use SLAB_DESTROY_BY_RCU 2016-04-04 22:11:19 -04:00
udp_impl.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
udp_offload.c udp: Remove udp_offloads 2016-04-07 16:53:30 -04:00
udp_tunnel.c udp: Add socket based GRO and config 2016-04-07 16:53:29 -04:00
udp.c udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb 2016-04-07 16:53:14 -04:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-12-22 16:26:31 -05:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c
xfrm4_tunnel.c