Loop-Free Alternative (LFA)

Published: 2022-10-31

OSPF Loop-Free Alternative (also called Fast-Reroute) is a technology for achieving fast failover times when a link goes down. Having the network drop or loop packets for 500-1000+ milliseconds while the routers compute a new path is not just acceptable in modern service provider networks. The main idea is to pre-install a backup path into the forwarding table (FIB) so that when the primary path goes down, traffic can instantly start using the backup without dropping any packets. LFA is topology-dependent, so its effectiveness depends on the network topology. We will explore all of this below.

Network Convergence Factors

Following the topology below, we have an OSPF network where the preferred path from R6 to R1 is via the R4-R2 path.

Global network convergence

Let's say the R1-R2 link goes down. The global network convergence time is determined by the amount of time required for every router in the network to be made aware of the link failure and to have computed a new network topology. The network has converged when all routers have computed and installed the new topology.

In this case R1 and R2 will send out OSPF LSAs to R3 and R4 respectively once the link down is detected. They will both run SPF to build a new topology tree and install the new routes to continue forwarding traffic. R4 and R3 receive the LSAs and flood them to R1,R5 and R5,R6 respectively; they run SPF and install the new routes. Finally R5, R6 and R7 will repeat the same process and at this point the network has converged.

Because R6 is quite far away from R1, R6 kept sending traffic to R1 via R4 and R2 while the network was converging. During the time taken for R2 to detect the R1-R2 link failure and compute a new path to R1, traffic from R6 to R1 would have been dropped. After the network has fully converged, R6 would now be sending traffic along the R4-R5-R3 or R4-R2-R3 path.

Local Network repair

The default OSPF timers on our routers (running IOS-XE) are quite aggressive, so 50 milliseconds after the OSPF LSA was generated and flooded, SPF would start and then complete 1-2 milliseconds later; the last step would be updating the FIB with the new routes which takes a couple of milliseconds in this small lab topology. R6->R1 traffic received by R2 would now be forwarded by R3. This process took less than 100 ms to complete and is in most networks good enough, but networks that carry VoIP traffic may not accept such a long failover time.

Optimizing Local Network repair

By utilizing Fast Reroute (LFA), a backup path can be preinstalled into the FIB on R2. For example, R2 computes a primary path to 1.1.1.1/32 using R1 and the R1-R2 link as its nexthop. A secondary SPF run would then find and install an alternative repair path via R3 and the R2-R3 link. If the primary path goes down, the backup path via R3 is instantly activated and traffic is forwarded normally. LSAs are then flooded, SPF is run and new routes are installed, all while traffic from R6 to R1 is forwarded. Here the downtime is probably less than 10 ms.

How LFA works

While EIGRP computes a Feasible Successor automatically, OSPF has to be told to generate a backup path. OSPF does it by running SPF from the point of view of its directly connected IGP neighbors. This is possible because the LSDB on all OSPF/IS-IS routers look the same. In the case of the topology above, R2 runs a primary SPF with itself as the SPF root. With LFA enabled, it then runs SPF three more times, once with R1 as root, once with R3 as root and finally with R4 as root.

A backup path is considered loop-free if the resulting path does not point back to itself. There are multiple fail conditions, but the most relevant one is the following:

Distance(N, D) < Distance(N, S) + Distance(S, D)

Distance(R3, R1) < Distance(R3, R2) + Distance(R2, R1)

N is the neighbor we are evaluating as a potential backup path. D is the destination we want to find a backup path to. S is the local router we are running SPF on.

In this case the R3-R1 distance (10) is less than the R3-R2-R1 distance (10+10), so it is impossible for R3 to send traffic destined to R1 via R2. The backup path via R3 is loop-free, we can safely install it.

Per-link vs Per-prefix

LFA can be configured in two modes. Per-link is not recommended but let's quickly explain why. In this mode, all routes that are currently reachable via the link we want to protect must use the same backup nexthop. If some routes want to use one backup path and other routes want to user another backup path, Per-link LFA considers LFA not to be possible and no backup paths are installed.

So instead we use per-prefix that setup individual backup paths for each active route over the protected link. This yields a much higher success rate for finding backup paths.

Configuration and verification

Let's configure Per-prefix LFA on all nodes and verify the functionality.

R1#show run | s router ospf
router ospf 1
 fast-reroute per-prefix enable prefix-priority high
R1#show ip ospf fast-reroute prefix-summary
Interface       Primary Protected Percent
Gi3             3       3         100%
Gi2             3       3         100%
Process total:  6       6         100%

R2#show run | s router ospf
router ospf 1
 fast-reroute per-prefix enable prefix-priority high
R2#show ip ospf fast-reroute prefix-summary
Interface       Primary Protected Percent
Gi1             1       1         100%
Gi3             3       3         100%
Gi4             4       2         50%
Process total:  6       6         75%

R4#show run | s router ospf
router ospf 1
 fast-reroute per-prefix enable prefix-priority high
R4#show ip ospf fast-reroute prefix-summary
Interface       Primary Protected Percent
Gi2             3       2         66%
Gi5             3       2         66%
Gi6             1       0         0%
Process total:  6       6         57%

The configuration is pretty simple. A single command under the OSPF process enables fast-reroute in per-prefix mode for all high-priority prefixes. High-priority prefixes are host-routes (loopback routes). All other prefixes like linknet routes are low priority.

Looking at the verification, we can see that only R1 was able to fully protect all of its received high priority prefixes. R2 and R4 were able to protect 75% and 57% respectively. R2 were not able to generate backup paths to either 4.4.4.4/32 or 6.6.6.6/32.

This gives a hint to the shortcomings of LFA. Depending on the topology, finding a loop-free alternative might not always be possible. For example, R2 is unable to find an alternative path to the R4 (4.4.4.4/32) loopback prefix because the possible backup node R3 has two active paths to R4, one via R5 and one via R2. The same is true for R1, any traffic that R2 sends to R4 via R1 will just be sent back.

The general rule of thumb for LFA seems to be that it works the best with triangle topologies. Let's say there was a link between R2 and R5. This link would allow R2 to get a backup path to (4.4.4.4/32) and (6.6.6.6/32) via R5.

In fact, adding a R2-R5 link would get R2 up to a 100% backup path success rate and R4 up 85%. The only prefix not covered from R4 would then be the 6.6.6.6/32 prefix, and the reason is that R5 still would have its best path to R6 back via R4.

LFA enhancements

Where IP-based LFA solutions lack in efficiency, MPLS-based solutions have been created to improve backup path creation success rate. The benefit of an MPLS-based approach is that the IP-packet is label-switched instead of routed, so the source router is able to dictate where the packet should go. I will cover each briefly below.

  • Remote-LFA (LDP)

    Since LFA is unable to find backup paths in some topologies, a LDP-based technique called Remote-LFA (rLFA) was developed to improve the success rate of LFA. It does so by creating LDP tunnels, allowing for example R2 to create a backup path to R4 via the R3-R5 path by adding an MPLS label to the packet when sending to R3, telling R3 that the packet should go to R5. Once the packet reaches R5, only the IP packet remains and R5 routes it normally to R4.

    While Remote-LFA is also topology-dependent it is less dependent than normal LFA. So if you already have LDP deployed in your network you may want to look into this.

  • TI-FRR (RSVP-TE)

    Topology-independent Fast-Reroute. if you are running RSVP in your network then you are probably aware of Fast-Reroute techniques deployed using RSVP. This is truly topology independent (hence the TI-FRR) and allows you to build tunnels that disregard the network topology completely, easily achieving a 100% success rate for all prefixes. The downside to RSVP tunnels is the increase in LSP signaling, so you have to weigh the pros and the cons for your network.

  • TI-LFA (SR-TE)

    Topology-Independent Loop-Free Alternative. This is an enhancement to Segment Routing, using Traffic Engineering capabilities to create backup paths that are topology independent. This is a very powerful tool as it is native to both OSPF and IS-IS, so there is no LDP or RSVP required. It also does not have the same scaling issues that RSVP does, because SR-TE tunnels are not signaled across the network.

Conclusion

We have explored the LFA enhancements to OSPF (and IS-IS) and also touched on its drawbacks and limitations. I hope to cover Remote-LFA, TI-FRR and TI-LFA in other posts in the future. Thanks for reading!


Copyright 2021-2023, Emil Eliasson.
All Rights Reserved.