RTGWG C. Villamizar Internet-Draft OCCNC Intended status: Informational March 14, 2014 Expires: September 15, 2014 Stability of Interior Routing and Load Balancing Techniques draft-villamizar-rtgwg-route-stability-00 Abstract This is an informational document intended to outline what is known about routing stability of interior gateway protocols (IGPs), traffic engineering techniques (TE) used in MPLS, and load balancing techniques. Known stable and unstable conditions provide bounds of known behavior. Network operators and equipment suppliers have relied heavily on operational experience as existence proof of instability or evidence of apparent stability. Between the bounds of known stable and unstable behavior there may be conditionally stable protocols and deployment scenarios. There is some discussion of the difficulty of mathematical stability proof for some techniques and a challenge to the research community. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 15, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Villamizar Expires September 15, 2014 [Page 1] Internet-Draft Routing and Load Balance Stability March 2014 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Document Scope . . . . . . . . . . . . . . . . . . . . . 6 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1. Intererior Routing . . . . . . . . . . . . . . . . . . . 7 2.1.1. Distance Vector (DV) Protocols . . . . . . . . . . . 8 2.1.2. Link State (LS) Protocols . . . . . . . . . . . . . . 8 2.2. MPLS Traffic Engineering (TE) . . . . . . . . . . . . . . 9 2.3. Label Distribution Protocol (LDP) . . . . . . . . . . . . 9 2.4. IP Fast Reroute (IPFRR) and LDP Fast Reroute (LDPFRR) . . 10 2.5. Protection and Restoration . . . . . . . . . . . . . . . 10 2.6. Multipath Load Balancing . . . . . . . . . . . . . . . . 11 2.7. Common Deployments . . . . . . . . . . . . . . . . . . . 12 3. Problems to Avoid . . . . . . . . . . . . . . . . . . . . . . 12 3.1. Transient Loops and Black Holes . . . . . . . . . . . . . 12 3.2. Long Convergence Times . . . . . . . . . . . . . . . . . 13 3.3. Impact of Multiple Failures . . . . . . . . . . . . . . . 13 3.4. Oscillations and Slosh . . . . . . . . . . . . . . . . . 14 3.5. Positive Feedback Loops . . . . . . . . . . . . . . . . . 14 3.6. Scaling Limitations . . . . . . . . . . . . . . . . . . . 15 4. Link State (LS) Protocol Behavior . . . . . . . . . . . . . . 15 4.1. Transient Black Holes and Route Loops . . . . . . . . . . 15 4.2. Excessive Link Up and Link Down Transitions . . . . . . . 16 4.3. LSPDU fragment boundary change . . . . . . . . . . . . . 17 4.4. Convergence Time . . . . . . . . . . . . . . . . . . . . 17 4.5. Scaling Issues . . . . . . . . . . . . . . . . . . . . . 18 4.5.1. Flooding and Dense Mesh Topologies . . . . . . . . . 19 4.5.2. SPF Computation . . . . . . . . . . . . . . . . . . . 20 4.5.3. External Routes . . . . . . . . . . . . . . . . . . . 21 4.5.4. Forwarding Entry Installation . . . . . . . . . . . . 21 4.6. LDP Impact . . . . . . . . . . . . . . . . . . . . . . . 22 4.6.1. IPFRR and LDPFRR Impact . . . . . . . . . . . . . . . 22 4.7. Delay, Jitter, and Loss Metrics . . . . . . . . . . . . . 22 5. MPLS-TE behavior . . . . . . . . . . . . . . . . . . . . . . 23 5.1. Transient Behavior . . . . . . . . . . . . . . . . . . . 23 5.2. Time Complexity of TE Restoration . . . . . . . . . . . . 24 5.3. Scaling issues . . . . . . . . . . . . . . . . . . . . . 25 5.4. Delay, Jitter, and Loss Metrics . . . . . . . . . . . . . 25 6. Multipath Load Balancing . . . . . . . . . . . . . . . . . . 25 Villamizar Expires September 15, 2014 [Page 2] Internet-Draft Routing and Load Balance Stability March 2014 6.1. Existing Multipath Techniques . . . . . . . . . . . . . . 26 6.2. Dynamic Load Balancing Feedback Loop . . . . . . . . . . 26 7. Summary of Factors in Load Balancing Stability . . . . . . . 27 7.1. Load Balance Granularity . . . . . . . . . . . . . . . . 28 7.2. Load Balance Adjustment Certainty . . . . . . . . . . . . 29 7.3. Load Balance Measurement Compatibility . . . . . . . . . 30 7.4. Load Balancing Decision Locality . . . . . . . . . . . . 30 7.4.1. Local Decision Load Balance . . . . . . . . . . . . . 31 7.4.2. Distributed Load Balance . . . . . . . . . . . . . . 31 7.5. Load Balance Timing and Traffic Adjustment . . . . . . . 32 7.6. Preventing Feedback Problems . . . . . . . . . . . . . . 33 8. Open Area for Research . . . . . . . . . . . . . . . . . . . 35 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 11. Security Considerations . . . . . . . . . . . . . . . . . . . 36 12. Informative References . . . . . . . . . . . . . . . . . . . 36 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 41 1. Introduction The consequences of deploying a marginally stable solution can be a sudden and catastrophic transition to an unstable state. Events related to instability vary in severity. Network meltdowns are often documented only in standards forum discussions or operational forum discussions and therefore being otherwise undocumented become the topic of Internet Folklore. Route oscillations can have a very detrimental effect on the network performance perceived by end users. In the mid-1990s, severe route oscillations could cause routers to fail, resulting in more route oscillation and eventually a provider meltdown. While router software today is more robust and therefore meltdown is very unlikely, performance degregation due to route oscillation as perceived by user traffic can be quite severe. Slow route oscillation, also known as route slosh, results in performance degregation though not as severe. This document outlines what the routing community knows and doesn't know about the topic of route stability in the types of networks deployed today and in potential deployment based on existing IETF work. A brief section also mentions areas for research (Section 8). This document tries to avoid straying from well understood routing protocol behaviors. 1.1. Terminology Related terms are grouped together. Villamizar Expires September 15, 2014 [Page 3] Internet-Draft Routing and Load Balance Stability March 2014 The terms listed below describe undesireable routing behaviors. The terms are listed in order of decreasing severity. routing meltdown A routing meltdown is a sustained period when a very substantial number of routers either repeatedly crash and reboot or repeatedly lose routing adjacencies. route instability Route instability is a sustain period when route change has a cascading effect and causes widespread transient problems such as route black holes or routing loops. route loop A route loop occurs when a cycle exists in forwarding entries. Some protocols are prone to creation of transient route loops. black hole A black hole exists when forwarding entries do not get packets to their destination. Transient black holes may occur during route change in some routing protocols. route oscillation Route oscillation is a condition where routes keep changing in the absense of change in any conditions which should legitimately cause route change. route slosh Route slosh is a proper subset of route oscillation. When route slosh occurs routes keep switching among two or more states where limited brief stability occurs between state changes due to protocol timers or other factors. Route slosh is essentially just slow route oscillation. The set of terms below describe normal routing behaviors and other routing terms. Related terms are grouped together within this list. route change Route change is as the name implies when routes and the corresponding forwarding entries change. Route change may occur for many reasons. shared risk link group (SRLG) An SRLG is is a group of links which due to a shared physical resource are likely to fail simultaneously or nearly simultaneously. Villamizar Expires September 15, 2014 [Page 4] Internet-Draft Routing and Load Balance Stability March 2014 network fault A network fault occurs when a node or link fails. Node protection is modeled using SRLGs covering all of the links adjacent to a given node. multiple fault A multiple fault refers to the failure of multiple resources causing links that are not covered by a common SRLG to fail. link metric Interior gateway protocols (IGPs) commonly use link metrics, which is synonymous with link costs, to determine a best route. See Section 2.1 and Section 4 for discussion of IGPs. traffic engineering Traffic engineering (TE) involves capacity information as well as the simple link metrics used in link state protocols. The primary focus here is the type of traffic engineering commonly practiced in RSVP-TE networks. See Section 2.2 and Section 5 for discussion of MPLS TE. route recovery Route recovery refers to getting traffic flowing after some event such as a network fault disrupts forwarding. Route recovery may involve protection and/or restoration. route protection Route protection refers to a form of recovery using an alternate path, generally involving no signaling after the fault to begin using the alternate path, and involving either local detection of the fault or a simple proactive connectivity verification. zero bandwidth protection In MPLS TE, the use of protection paths with zero bandwidth allocation is common. This is discussed in Section 2.5. route restoration Route restoration refers to recovery that involves exchange of routing information and/or some form of signaling to set up an alternate path. multipath Villamizar Expires September 15, 2014 [Page 5] Internet-Draft Routing and Load Balance Stability March 2014 A definition of the term "multipath" can be found in [I-D.ietf-rtgwg-cl-requirement]. Common forms of multipath include Equal Cost Multipath (ECMP), Ethernet Link Aggregation, MPLS Link Bundling. See [I-D.ietf-rtgwg-cl-use-cases] for discussion of existing forms of multipath. local decision A local decision is one in which a single network element is involved in the decision. distributed decision A distributed decision is one where the independent decisions of multiple network elements have an effect on the outcome. offline route computation Offline route computation is where a centralized global routing decision with the results of the decision passed to all affected network elements. Global optimization is often cited as its advantage, though in IP networks it is seldom used due to very slow route restoration. 1.2. Document Scope This document focuses primarily on routing stability of the interior protocols used in service provider (SP) networks and on load balancing techniques used in SP networks. Service providers almost exclusively use link state (LS) routing protocols as interior gateway protocols (IGPs) rather than distance vector (DV) protocols. The two link state routing protocols commonly used are OSPF [RFC2328] [RFC5340] and IS-IS [ISO.10589.1992] [RFC5308]. The MPLS LDP protocol [RFC5036] creates MPLS forwarding entries which either follow IGP routes or behaves in a manner similar to a DV protocol. Techniques such as IPFRR and LDPFRR [RFC5714] [RFC6981] alter the transient behavior of the IGPs or LDP and are discussed briefly. MPLS traffic engineering (TE) as practiced using RSVP-TE [RFC3209] makes use of IGP extensions based on [RFC3630] [RFC5305] to carry TE information. MPLS-TE has inherently different behavior than IGP routes alone or MPLS using LDP. There has been activity in the IETF related to using secondary link metrics related to delay, jitter, and loss [I-D.ietf-ospf-te-metric-extensions] [I-D.ietf-isis-te-metric-extensions]. This includes use of these secondary metrics for MPLS TE path selection [I-D.ietf-mpls-te-express-path]. Villamizar Expires September 15, 2014 [Page 6] Internet-Draft Routing and Load Balance Stability March 2014 Load balancing using hashing techniques date back to at least the mid-1980s, but were not formally documented until [RFC2991] and [RFC2992] though focusing solely on already outdated use of hashing in load balancing. It was not until documents such as [I-D.ietf-rtgwg-cl-use-cases] and [I-D.ietf-mpls-forwarding] that these load balancing techniques were well documented. There is also interest in dynamic load balancing which will also make use of delay, jitter, and loss information, either locally available or IGP advertised [I-D.ietf-rtgwg-cl-requirement]. The topic of stability with regard to this type of work is also discussed. Much has been written about the stability of the BGP routing protocol. However, since the focus of this document is on interior routing stability, discussion of BGP stability is out of scope. There is some brief discussion of the impact of interior routing stability on BGP routes in Section 4.4 and Section 4.5. Load balancers typically used by enterprise and to some extent in data centers are not within scope. This type of load balancer is not used in SP networks are the balancing algorithms are largely proprietary. 2. Background The greatest concern regarding stability and oscillations has been in service provider (SP) networks, simply because any instability which occurs more readily at large scale has in the past affected SP networks first. It may be that today's very large data centers now pose unique scalability and stability problems as well, but the focus remains on SP networks. Some forms of instability can occur in enterprise networks and even very small networks, however common practice has evolved to make such occurances acheivable only through severe misconfiguration. This section provides a brief introduction to a set of protocols in use in SP networks today. The focus is mainly on link-state routing protocols and LDP, on MPLS TE, and on multipath load balancing. These are the common underpinnings of SP networks and a recent topic of stability concerns due to proposed extensions to today's common practice. 2.1. Intererior Routing Interior routing protocols are commonly referred to as interior gateway protocols (IGPs). IGP is somewhat of a historic term but still widely used. There are two types of IGPs in widespread use in IP networks. There are numerous alternate techniques described in research, but with no deployment. The two common types are Distance Villamizar Expires September 15, 2014 [Page 7] Internet-Draft Routing and Load Balance Stability March 2014 Vector (DV) and Link State (LS). In later sections only the behavior of LS protocols is discussed since DV protocols are seldom if ever used in large SP networks. 2.1.1. Distance Vector (DV) Protocols Distance vector (DV) routing protocols are rarely if ever used by service providers. Distance vector routing protocols exchange distances to remote destinations. One of the earliest DV protocols in widespread use was Routing Information Protocol (RIP). The first version is now known as RIP-1 [RFC1058]. RIP-1 was used in the 1980s and 1990s but replaced by RIP-2 [RFC2453] due to RIP-1 shortcomings, most severe at the time being lack of CIDR [RFC4632] support. RIP-ng [RFC2080] provides an extension to RIP-2 which supports IPv6. RIP is generally only used in small networks such as homes, small offices, and small enterprise. RIP's advantage is simplicity. RIP's disadvantages are many. Among them are slow convergences as networks become large. Another DV routing protocol is Enhanced Interior Gateway Routing Protocol (EIGRP). EIGRP is a much more advanced DV routing protocol than RIP, eliminating most if not all of RIP's shortcomings. EIGRP was a Cisco proprietary protocol roughly until 2013 when Cisco gave permission for others to use EIGRP, recognizing that no SP would use EIGRP due to the vendor lock-in. Yet another DV routing protocol is Babel [RFC6126] which is well suited for wired and wireless network and is particular well suited for ad-hoc wireless networks. Babel is not expected to see any deployment on SP networks, but could perhaps displace RIP on small networks such as home, small office, and enterprise. 2.1.2. Link State (LS) Protocols Link state (LS) routing protocols build a topology database based on each router advertising a set of neighbor adjacencies and providing information about the router's capabilities and about each adhacency. This topology database is known as a link state database (LSDB). Among the advantages of LS protocols is the ability to carry additional topology information useful for other purposes, such as MPLS traffic engineering (TE) (see Section 2.2). Villamizar Expires September 15, 2014 [Page 8] Internet-Draft Routing and Load Balance Stability March 2014 The two major LS protocols are open shortest path first (OSPF) [RFC2328][RFC5340], defined by the IETF, and intermediate system to intermediate system (IS-IS) [ISO.10589.1992], defined by ISO but extended by IETF for IP use and for MPLS TE use. Differences between OSPF and IS-IS have sparked heated discussion, though most experts concede that use of either is more a matter of preference than strong technical merit. OSPF sends a single router link state advertisement (LSA) per router with multiple opaque LSAs [RFC5250] carrying additional information. IS-IS sends a single link state protocol data unit (LSPDU) per router with TLV packed into multiple LSPDU fragments. Multiple LSPDU fragments are needed because the size of the complete LSPDU often far exceeds the link MTU. Arguments regarding the technical merit of one of these two LS protocols over the other are usually focused on information packing and flooding efficiency. In both protocols the link state database (LSDB) can become very large in large networks, particularly those using MPLS TE. Nearly identical MPLS TE extensions exist in OSPF and ISIS, with information exchanged in relatively small type-value- length tuples (TLVs) extensions carried in OSPF opaque LSA or in the IS-IS LSPDU fragments. In OSPF related TLVs are packed into LSA. In IS-IS many related and unrelated TLVs are packed into a given LSDB fragment. The behavior of LS protocols is discussed in detail in Section 4. 2.2. MPLS Traffic Engineering (TE) MPLS traffic engineering (TE) makes use of additional information carried in the link state (LS) routing protocols OSPF and IS-IS (see Section 2.1.2). MPLS label switched paths (LSPs) are set up using TE extensions to RSVP, known collectively as RSVP-TE. The base RSVP-TE protocol specification is [RFC3209]. The behavior of MPLS TE is discussed in detail in Section 5. 2.3. Label Distribution Protocol (LDP) The MPLS Label Distribution Protocol (LDP) [RFC5036] can either make use of the best path as computed by the IGP or act as an independent DV protocol. IETF has decided not to add TE extensions to LDP [RFC3468]. MPLS provides a light weight tunneling encapsulation and LDP provides a relatively light weight protocol to support MPLS forwarding and therefore to efficiently carry pseudowires (PW) and virtual private network (VPN) services. LDP creates point to multipoint (P2MP) LSPs directed to specific forwarding equivalence Villamizar Expires September 15, 2014 [Page 9] Internet-Draft Routing and Load Balance Stability March 2014 class (FEC) rather that the point to point (PTP) LSPs commonly used in RSVP-TE. 2.4. IP Fast Reroute (IPFRR) and LDP Fast Reroute (LDPFRR) IP/LDP Fast Reroute (IP/LDR FRR) [RFC5714] is also applicable in MPLS networks. ECMP and Loop-Free Alternates (LFA) [RFC5286] are well established IP/LDP FRR techniques and were the first methods to be widely deployed. Work on IP/LDP FRR is ongoing within the IETF RTGWG. Two topics actively discussed in RTGWG are microloops and partial coverage of the established techniques in some network topologies. [RFC5715] covers the topic of IP/LDP Fast Reroute microloops and microloops prevention. RTGWG has developed additional IP/LDP FRR techniques to handle coverage concerns. RTGWG is extending LFA through the use of remote LFA [I-D.ietf-rtgwg-remote-lfa]. Other techniques that require new forwarding paths to be established are also under consideration, including the IPFRR "not-via" technique defined in [RFC6981] and maximally redundant trees (MRT) [I-D.ietf-rtgwg-mrt-frr-architecture]. ECMP, LFA (but not remote LFA) and MRT swap the top label to an alternate MPLS label. The other methods operate in a similar manner to RFC 4090 facility backup and push an additional label. IP/LDP FRR methods which push more than one label have been suggested but are in early discussion. The IPFRR techniques apply to LDP fast reroute (LDPFRR). Some techniques make use of tunneling to improve coverage. The tunneling could in principle be provided by GRE but in practice MPLS with LDP is most often used to avoid a potential failure of a specific next- hop or specific SRLG. The techniques which use tunneling require that the router known the path that other routers will be using to know that the potential failure will be avoided and therefore require the use of a LS IGP rather than a DV IGP or no IGP and use of LDP as a DV protocol. 2.5. Protection and Restoration Network protection and restoration are collectively know as recovery. Getting traffic to flow again after a fault is network recovery, regardless of how it is accomplished. Route protection is often handled largely in hardware or forwarding card firmware rather than control plane software and can acheive recovery within a few tens of milliseconds. While protection is fast, protection against multiple faults is highly impractical. Restoration may involve flooding updated routing information and may involve signaling. Therefore restoration is inherently slower than Villamizar Expires September 15, 2014 [Page 10] Internet-Draft Routing and Load Balance Stability March 2014 protection but covers all variations of multiple faults except those multiple faults which cause network partitioning and are inherently irrepairable. The practice of zero bandwidth protection in MPLS FRR [RFC4090] relies on traffic class (TC) markings [RFC5462] and diffserv [RFC3270] [RFC4124] to provide very fast recovery for high priority traffic with no cost due to resource reservation of the protection paths. This protection of high priority traffic may occur at the expense of transient congestion for lower priority traffic. When combined with restoration, particularly if restoration is relatively fast, any congestion of lower priority traffic is eliminated or at least minimized shortly after protection goes into effect. A very cost effective deployment, and therefore quite common, is to use MPLS FRR [RFC4090] for very fast (tens of msec) single fault protection and to optimize router code for fast (well under a second) restoration. Zero bandwidth reservation on the protection paths is commonly and perhaps exclusively used today. The fast restoration covers arbitrary multiple faults and relieves congestion after initial zero bandwidth FRR protection. This type of deployment is aimed at providing low cost and high availability for preferred services and also providing low cost and reasonably short disruptions for lower priority services when faults occur. 2.6. Multipath Load Balancing Multipath load balancing in use today includes Equal Cost Multipath (ECMP), Ethernet Link Aggregation [IEEE-802.1AX], and MPLS Link Bundling [RFC4201]. In each case the load balancing involves a local decision and generally uses static load balancing. In all of these cases the load balancing is based on IP header and MPLS label stack information as described in [I-D.ietf-mpls-forwarding]. Ethernet Link Aggregation using Ethernet MAC addresses is entirely useless in SP networks or any routed network. Therefore Ethernet Link Aggregation as implemented by routers also uses IP header and MPLS label stack information rather than Ethernet MAC addresses. Static load balancing, being static, is inherently stable. Dynamic load balancing is still a local decision, but where measurements are made and the load balance is adjusted based on those measurements. Measurements can be based on output link utilization, queue occupancy or queuing delay, or queuing loss including active queue management (AQM) loss, or some combination of these. Dynamic load balancing has the potential to be unstable but has been successfully deployed in SP networks providing existance proof that stable implementations are possible. Villamizar Expires September 15, 2014 [Page 11] Internet-Draft Routing and Load Balance Stability March 2014 The behavior of multipath load balancing is discussed in Section 6. 2.7. Common Deployments Rarely if ever are service provider (SP) network deployments exactly alike, but most have a great deal in common. Nearly all major SP (possibly all) use either OSPF or IS-IS. Some use no MPLS at all, though this subset is shrinking. Many use LDP but not RSVP-TE. Some use both LDP and RSVP-TE. Some use RSVP-TE but not LDP though this may be less common than a decade ago. The use of MPLS FRR [RFC4090] is thought to be more common than IPFRR or LDPFRR at least among larger providers. Protection is not always provided. Fast restoration is considered essential even where protection is used due to the high occurance of multiple faults despite efforts to account for shared resources using SRLG. All large SP use multipath techniques, typically ECMP and Link Aggregation. Static load balancing has generally been used. Though dynamic load balancing has been successfully deployed, today static load balancing may be exclusively used due to the disappearance of the one core router supplier supporting dynamic load balancing. Given only these set of tools and given fixed link metrics, it is difficult to build a network which is unstable. It is possible to build an unstable network, even with fixed link metrics, if scaling limitations described in Section 3 are ignored and exceeded, but as mentioned in Section 2, with today's common practices this situation is acheivable only through severe misconfiguration. 3. Problems to Avoid Various existing routing protocols exhibit different transient behaviors and different tendencies toward specific problems. Any problem which is transient is generally considered less severe than a problem that is persistent. This section describes some common problems that new protocols, implementations, and deployments need to avoid where possible. Section 4, Section 5, and Section 6 discuss the tendency of some routing protocols or techniques to exhibit these various problems. 3.1. Transient Loops and Black Holes After a fault a transient black hole at the failed resource may be unavoidable. Some routing protocols have a tendency to form transient black holes elsewhere when a topology change occurs and can also form transient loops in some circumstances. Transient loops can Villamizar Expires September 15, 2014 [Page 12] Internet-Draft Routing and Load Balance Stability March 2014 cause severe congestion within the loop, affecting traffic which should not have been impacted from the initial fault. 3.2. Long Convergence Times The most severe long convergence time problems are almost always due to poor implementations or badly designed networks. However, even among competent routing protocol implementations and competent network designs, some routing protocols are more prone to long convergence times than others. While IGP protocols are not entirely immune to long convergence times, MPLS TE is particularly prone to long convergence times unless care is taken to avoid scaling issues. The shortest path first (SPF), sometimes called shortest path tree (SPT) computation in a LS protocol is time complexity order NlogL, where N is the number of nodes and L is the number of links. For MPLS TE, one CSPF computation is needed for each LSP and each CSPF computation is order NlogL. A fault may affect the majority of the LSPs for which a given LSR is ingress and therefore may involve order N^2logL (N squared log L) computation for that LSR. 3.3. Impact of Multiple Failures There are many situations in which the most common mode of failure or a common mode of failure causes simultaneous failure of multiple nodes or more often multiple links. For example, many links may traverse a common physical fiber or fiber segment using wave division multiplexing (WDM). These shared resources can be taken into account using shared risk link groups (SRLG) when computing protection paths. A multiple failure is one where network elements (nodes and/or links) which do not share any common SRLG fail simultaneously. Multiple failure occur on a regular basis. These occur as a result of a potential shared resource failure which was either unknown or overlooked or considered so improbable as to make it more practical to ignore it. Examples of completely unavoidable multiple failures are fibers on the same earthquake fault line, fibers crossing the same bridge (where only bridge collapse would cause simultaneous failure), and fibers on both sides of railroad tracks (where only a train derailment would cause simultaneous failure), widespread flooding. There are many other less dramatic failure scenarios which are more common. Protection does not cover multiple faults. Covering even double faults using protection adds significant network complexity. Where protection requires resource reservation such as in transport networks, the need to reserve a multiple of the resource being Villamizar Expires September 15, 2014 [Page 13] Internet-Draft Routing and Load Balance Stability March 2014 protected make covering multiple faults prohibitably expensive in most cases. Using protection covering only single faults and having no restoration capability is unacceptable for IP networks and unacceptable for many of today's services. Protection covering only single faults with no restoration is common practice for transport protected services largely accounting for steep decline in use of protected transport services. Using restoration with excessively long convergence times is also unacceptable for IP and for many of today's services. 3.4. Oscillations and Slosh An extremely severe problem is persistent route oscillations. Other than buggy implementations or exceeding implementation scaling limits, producing oscillations in today's SP networks would require misuse of link metrics which vary with traffic load or some other form of positive feedback (see Section 3.5). When protocol timers or other protocol features limit oscillation to a fairly long period, the resulting low frequency route oscillation is known as route slosh. Route slosh is also a severe problem if it is persistent. If damped such that the slosh impact fairly quickly reduces to zero, then some slosh may be acceptable. 3.5. Positive Feedback Loops Any system with positive feedback is unstable. For example, consider a simple electronic amplifier. With DC positive feedback, any small signal would cause a saturated output. Amplifiers employ negative feedback to improve linearity. However if the feedback is delayed a little, then positive feedback can occur at high frequencies which can result in a fried amplifier. The classic case of positive feedback in routing is the delay based link metrics used briefly in the 1980s [DBSPRA]. Various forms of damping are discussed in the paper describing this use of delay based link metrics, some with limited success, though in the end ARPANET discontinued using delay based link metrics. Use of delay, jitter, or loss has the potential for creating a situation where a positive feedback loop exists and therefore where oscillation occurs. This problem is discussed in Section 4.7, Section 7.4.2 and Section 7. A similar feedback may occur in MPLS TE if an "auto-bandwidth" feature is used, though occurance unlikely. This feature varies the Villamizar Expires September 15, 2014 [Page 14] Internet-Draft Routing and Load Balance Stability March 2014 requested LSP resource reservation based on observed traffic measured at the LSP ingress. Increasing the LSP resource reservation could put an LSP on a longer delay path. The additional delay may reduce the offerred load slightly and allow the LSP to be moved back to its original path. In practice this problem is avoided by using only very long term filtered measurements, though the result may be very low frequency route slosh. 3.6. Scaling Limitations Routers have finite resources. Scaling problems can occur in the following situations. 1. When protocols are poorly designed and inherently consume excessive resources. 2. When specific types of deployments are used with protocols which consume excessive resources in those configurations. 3. When implementations provide inadequate resources to support what today are considered normal deployments. Resources which may become exhausted include specialized buffering, storage, and computational resources. Specific examples of scaling problems are given in Section 4.5 and Section 5.3. 4. Link State (LS) Protocol Behavior Link state (LS) protocols are simple and effective. Transient behavior during route change is somewhat prone to route loops but with reasonably good implementations convergence is fast and therefore transient problems are short lived. The two prominent link state (LS) protocols, OSPF and IS-IS, exhibit very similar behavior. Only where there is a significant difference will either of the two protocols be mentioned in this section. 4.1. Transient Black Holes and Route Loops Villamizar Expires September 15, 2014 [Page 15] Internet-Draft Routing and Load Balance Stability March 2014 Transient problem arise from the hop by hop nature of forwarding with direct application of LS protocols. Each router makes a decision to change routes independently and the order in which routers learn of a LSDB change and make a decision based on changes can vary. Transient situations can arise where routers in a subset of the topology have installed routes based on a slightly different LSDB and either route traffic in a two router loop which is suppressed resulting in a black hole, or in a multirouter loop causing a forwarding loop. As long as convergence is fast, the impact of transient problems is small. A forwarding loop can create very severe but temporary congestion. The temporary congestion created by a forwarding loop will often affect traffic to destinations that should not have been affected by the route change. LS convergence, including flooding, routing decision, and installation of forwarding entries today can be well under a second but does not rival the few tens of milliseconds acheived by protection mechanisms. 4.2. Excessive Link Up and Link Down Transitions Links are considered in an "up" or "down" state depending on whether some criteria is met. The most simple criteria, successful delivery of routing protocol hello packets, is among the most problematic. Under some conditions, a link may undergo excessive state changes unless the implementation takes some action to prevent this situation. For example, hysterisis in the criteria may be used or link state may be held in a "down" state as a result of frequent change. Excessive transitions may occur when link state "down" transitions are quickly triggerred by some number of errors and are quickly reverted to the "up" state if the error rate falls below the target rate. This can be solved by one or more of a number of means: 1. Hold the link in the down state for a minimum time period which is long compared to the measurement interval. 2. Make use of hysteresis, such as a long period of completely error free operation before changing link state back to "up". 3. Make use of a technique which progressively keeps the link down for longer periods depending on the rate at which the error threshholds are crossed. For example, [RFC2439] can be applied to IGP link state at the originating router as well as applied to BGP. If link state changes are frequent, even just for a single link, then a network running a LS protocol may spend a significant amount of time in a transient state where route loops and black holes are possible. Villamizar Expires September 15, 2014 [Page 16] Internet-Draft Routing and Load Balance Stability March 2014 4.3. LSPDU fragment boundary change A problem unique to IS-IS is related to LSPDU fragement packing and shifting boundaries. If an implementation maintains a sorted order among TLVs, maintains tight packing of TLVs into LSPDU fragements, and moves TLVs from one LSPDU fragment to another to maintain the tight packing, then a transient condition can occur where the LSPDU fragement from which a set of TLV is withdrawn arrives first and an SPF is triggerred before the change to the LSPDU fragement where the set of TLVs have been added is learned. A worst case implementation strategy tries to pack the set of LSPDU fragments as tightly as possible and in keeping them tightly packed frequently moves TLVs across LSPDU boundaries when changes occur. The boundary shift due to a link changing may result in erroneous withdrawl of a set of unrelated TLVs. In addition, a larger set of LSPDU fragments have to be advertised if boundaries are shifted than if less tight packing is targeted. There are numerous strategies to avoid moving fragment boundaries and to eliminate the erroneous withdrawl of TLVs. One strategy is to set a target in which LSPDU fragments are packed within a fractional range of full. For example a target 1/2 to 3/4 might be used. A second strategy is to separate the action of updating TLVs from the action of moving boundaries to keep within the target packing range and when moving boundaries delay sending the withdrawl of the TLVs relative to their addition in the adjacent fragment. Other strategies to avoid this problem are possible. 4.4. Convergence Time Link state (LS) convergence time includes flooding, routing decision (SPF plus changes to inter-area or inter-level routes plus changes to external routes), and forwarding information base (FIB) update. The forwarding entries used by external routes carried within the IGP and by other routing protocols such as BGP are often affected by the underlying IGP intra-area or intra-level routes. This second set of routes may have to be changed, though with multistage forwarding lookups used in many routers today most secondary changes are completed automatically. Operational experience has shown that flooding should be prioritized over route recomputation. The SPF computation on fast hardware with a small topology may occur in under 10 msec, and in some cases under 1 msec, however on slower hardware and with larger topologies, an SPF may exceed 100 msec. If reflooding at each hop is delayed by SPF time, the overall convergence is slowed. In addition, a single resource fault may result in multiple topology changes (see Villamizar Expires September 15, 2014 [Page 17] Internet-Draft Routing and Load Balance Stability March 2014 Section 3.3 for background). Prioritizing reflooding and slightly delaying SPF helps ensure that the next SPF will be based on the full set of changes. With a decision to prioritized reflooding and to delay SPF the question arises as to how long to delay SPF. For most networks an approximate worst case fault detection time is known. This detection time often has a lower bound dictated by speed of light propogation delay (about 1 msec per 200 km or 125 miles of fiber) and a practical upper bound that is close to the lower bound. For long haul terestrial networks a lower bound of 10-20 msec is common. For trans-oceanic fiber this can be 50-100 msec. A common practice is to set a timer when the first IGP change is received, reflood as quickly as possible, and perform an SPF when the timer elapses. With a timer that is a small multiple of worst case fault detection time the SPF will tend to occur with a full set of LSDB changes rather than a partial set of changes. In a multithreaded control processor architecture, it is possible to continue reflooding while an SPF is in progress, as long as newly arriving changes do not affect the SPF in progress. In such a case a delay in starting an SPF is often useful, but the delay can be made shorter, particularly with a very fast SPF relative to worst case fault detection time. With a very fast SPF, the limiting factor may be forwarding entry install time. Convergence time is also impacted by scaling issues discussed in Section 4.5. 4.5. Scaling Issues Scaling issues result from exceding the practical capacity of a given resource. Some of the affected resources include the following: socket buffer Severe scaling problems can occur in full mesh or very dense mesh networks due to limited short term buffering of flooded data. See Section 4.5.1. computation Computation issue seldom occur except with poor implementations or networks containing a very large single OSPF area or IS-IS level. See Section 4.5.2. LSDB size An extremely large number of external routes carried by a LS protocol can cause scaling problems. See Section 4.5.3. Villamizar Expires September 15, 2014 [Page 18] Internet-Draft Routing and Load Balance Stability March 2014 FIB install speed The amount of time required to install an extremely large number of forwarding entries can cause problems. See Section 4.5.4. Any of these problems can be made slight better by an exceptionally good implementation or much worse by an unusually bad implementation. Each of the subsections that follow illustrate a type of scaling that is capable of overwhelming even the best implementations through careless network design. Particularly bad implementations can have their own unique scaling limitations. The set of scaling issues described here is almost certain to be incomplete. 4.5.1. Flooding and Dense Mesh Topologies A physical fault may affect a number of neighbor adjacencies advertised by a number of routers. This is generally a relatively small number, perhaps tens of adjacencies and a smaller number of routers. In OSPF each router advertises a new router LSA plus it may advertise opaque LSAs with additional information about the affected links. In IS-IS the equivalent of the router LSA is generally packed into one LSPDU fragement. One or more additional LSPDU fragements may carry the TLVs with additional information about the affected links. IS-IS passes each LSPDU fragment as a full MTU packet, which for SP core networks is often in excess of 8 KB. Each router originating a set of changes passes this updated information, perhaps a few 10s of KB, to each remaining neighbor. If the information is not already known from some other source, then any given neighbor will then reflood to each of its neighbors. Most SP networks have an average of about 2.5-4 core neighbors per core router. For this type of topology, known as a sparse mesh, LS protocol flooding (absent any bad implementation) is fast, efficient, and free of scaling problems. A very dense mesh network or a full mesh network represents a pathologic case. Attempts to use an overlay model often result in a full mesh or a very dense mesh. In a full mesh of N routers, each router originating a set of changes is adjacent to every other router, or N-1 routers. Each of the N-1 routers will reflood to its N-2 neighbors. The original 10s of KB origninated as a result of a single fault is multiplied by a factor that is order N^2 (N squared). Any given LSR may receive N feeds of the same information. For example, if N is 1000 and 40 KB of change was originated, a given router may receive 40 MB of route updates when a single link changes state. This can overflow the memory Villamizar Expires September 15, 2014 [Page 19] Internet-Draft Routing and Load Balance Stability March 2014 allocated for socket buffers and cause a lot of this flooded information to drop and have to be retransmitted, lengthening convergence. Any part of the feed that is dropped at one router from one neighbor but successfully received from another neighbor will be sent the other way toward the router from which the information was dropped. If a router fails in a full mesh network of N nodes, the situation is much worse than for a link failure. That one router is adjacent to N-1 other routers and each must originate at least one LSA or LSPDU fragement and likely a few. We'll call the number of LSAs or LSPDU fragments originated per router K. Each of these N-1 routers originate K LSAs or LSPDU fragments and send these to their N-2 neighbors which will reflood to N-3 of their neighbors. The total number of LSAs or LSPDU fragements sent is order K*N^3. Each router could receive a feed of K*N-1 LSAs or LSPDU fragments from each of N-2 neighbors in a short time period. For example, for K=5, the MTU is 8KB (IS-IS), and N=1000, a router can receive a set of 40 MB feeds totaling about 40 GB. No router today or in the forseeable future has that amount of socket buffer space. Flooding provides a fast way to distribute route changes and works extremely well on a typical SP network with a sparse mesh. Flooding simply does not work well in large full mesh topologies or even large very dense mesh topologies. For example, some attempts to overlay IP networks with a LS protocol over an ATM full mesh in the mid 1990s yielded severely unstable (and therefore short lived) networks. Full mesh or dense mesh topologies should be avoided. 4.5.2. SPF Computation The shortest path first (SPF) computation used in LS protocols requires time order NlogL, where N is the number of nodes and L is the number of links. In network designs with a node degree that is relatively independent of N, for example a node degree of 3-5 on average, computation time can be expressed as KNlogN, where K is a constant. A rule of thumb which provided a reasonable estimator around the year 2000 used KNlogN as an estimator with K=1usec. This was actually much faster than most SPF of that day, but today K might be smaller. For K=1usec and N=1000, KNlogN would be about 10,000 usec or 10 msec. Villamizar Expires September 15, 2014 [Page 20] Internet-Draft Routing and Load Balance Stability March 2014 From the prior example, it is clear that a network with close to 1,000 nodes in a single OSPF area or IS-IS level was supportable at that time (circa 2000). Some five to ten years earlier and with slower processors and some poorly written and therefore slower SPF code, a rule of thumb of 400 nodes was commonly cited as the upper limit for the size of an OSPF area. The figure 400 was said to be emperically determined. Though processors are getting faster and SPF code gradually improves as well, there will always remain a practical limit to the number of nodes that can be supported in a given OSPF area or IS-IS level. At some point convergence times become excessive. In extreme cases protocol timers of other protocols such as BGP (configured to use faster timers than the BGP defaults) can expire due to excessive convergence times and cause further network disruption. 4.5.3. External Routes Due to OSPF and IS-IS periodic route refresh and due to the way in which both protocols implement flooding, a very large number of external routes should not be carried by the IGP. Implementations have been known to tolerate up to tens of thousands of external routes, but generally will become unstable with a BGP global routing feed or worse yet, multiple BGP feeds into the IGP. 4.5.4. Forwarding Entry Installation Operational experience supports implementations installing forwarding entries for OSPF intra-area or ISIS intra-level routes first, then installing forwarding entries for any inter-area routes or routes from another level, and then external routes. External routes may be carried in the IGP or carried by another protocol, most commonly BGP. BGP may carry a very large number of routes. By installing intra-area or intra-level forwarding entries first, accurate forwarding with an area or level is better ensured. By giving the next priority to inter-as or inter-level routes, accurate forwarding within the infrastructure of the autonomous system (AS) is better ensured. This ensures that protocols that depend on this infrastructure will continue to operate in time of high route change. Many forwarding engines allow either two stage, multistage, or fully recursive forwarding lookup. Supporting multiple stages allows the forwarding for external routes which depend on internal routes to automatically change if only the internal route has changed. This can dramatically decrease the number of forwarding entries that need to be changed and reduce effective convergence time Villamizar Expires September 15, 2014 [Page 21] Internet-Draft Routing and Load Balance Stability March 2014 In a particularly bad implementation if forwarding entry installation is slow, and if only single stage forwarding entries are supported, and if IGP routes are not prioritized, forwarding entries can remain incorrect long enough for IBGP peerings to go down (particularly if BGP timers are reduced from the BGP default values). In such cases it is possible for the route change resulting from IBGP peerings to cause further churn. Not since the late 1990s has there been implementations with severe forwarding entry install time problems deployed in SP networks. The problem is only worth mentioning in hopes that new vendors don't repeat these mistakes. 4.6. LDP Impact The behavior of MPLS using LDP is very similar to the behavior of LS protocols. Some DV protocols support loop suppression through maintaining a sequence of hops and this too is supported by LDP. The LDP loop suppression avoids creating transient loops in LDP when transient loops occur in the LS routing protocol. LDP therefore has better transient behavior than a LS routing protocol alone. Services which are carried only by LDP, such as PW and VPN services, cannot loop. When IP is carried by LDP and where there is a fallback to IP forwarding, transient loops will remain a problem for that IP traffic. 4.6.1. IPFRR and LDPFRR Impact IPFRR and LDPFRR are described in Section 2.4. IPFRR provides protection against single faults. IPFRR can include SRLG consideration. Timer mechanisms help avoid microloops by delaying SPF based on distance from a fault, in that way sequencing the transition from protection to restoration [RFC6976]. If multiple faults occur IPFRR has the potential to be slightly counterproductive. IPFRR may have a greater potential to create transient loops in the event of a multiple fault. For example, LDP without LDPFRR can effectively suppress loops, but with tunnels and label stacking can create transient loops. In addition any loops that form when a multiple fault occurs may be made to persist longer due to the timers introduced to suppress microloops. The protection against single faults offered by IPFRR and LDPFRR greatly outways any transient misbehavior that may occur when multiple faults occur. 4.7. Delay, Jitter, and Loss Metrics Villamizar Expires September 15, 2014 [Page 22] Internet-Draft Routing and Load Balance Stability March 2014 Past use of delay link metrics in LS protocols has resulted in oscillations [DBSPRA] (see Section 3.5 for background). The impact of path change due to link metric change has a major effect on traffic levels and therefore an effect on the link metric. The interdependency of path, traffic levels, and link metric creates the potential for strong positive feedback and severe oscillations. Oscillations due to the use of link metrics based on delay, jitter, or loss in a LS protocols may be unavoidable. This is discussed further in Section 7. 5. MPLS-TE behavior MPLS traffic engineering (TE) forwarding is not decided hop by hop. MPLS TE Label Switched Path (LSP) path selection is determined entirely by the ingress LSR and nearly always include only strict hops in the explicit route object (ERO) which dicates the LSP path in RSVP-TE signaling [RFC3209]. The ingress LSR makes its path decision based on estimated bandwidth requirements of the LSP and link bandwidth availability advertised in the IGP. MPLS TE traffic cannot flow on an LSP until the LSP is setup using RSVP-TE signaling. At setup time, the estimated bandwidth requirements serves as a bandwidth reservation. Due to the bandwidth accounting and the ability for ingress LSR to direct traffic, the behavior of MPLS TE is significantly different than that of LS protocols. MPLS TE makes use of resource accounting, avoiding attempts to utilize more capacity than any part of the topology can support. In this way networks using RSVP-TE differs from those using LDP. In addition, some MPLS TE implementations allow multiple LSP between a given ingress and egress, each potentially taking different paths, with load split across these LSP in proportion to their reserved bandwidth. This is a form of static load balancing (see Section 6 and Section 7). 5.1. Transient Behavior MPLS TE LSP are created using an explicit path from ingress LSR to egress LSR. There is no opportunity for differences in the LSDB to cause forwarding loops. MPLS TE therefore has better transient response than LS protocols in that it is always loop free. MPLS FRR [RFC4090] provides protection against single faults. Since it is initiated at the point where the fault is detected, known as the point of local repair (PLR), MPLS FRR is very fast, typically a few tens of milliseconds. This provides an excellent response to single faults, with restoration time often only slightly longer than the fault detection time. Villamizar Expires September 15, 2014 [Page 23] Internet-Draft Routing and Load Balance Stability March 2014 When multiple faults occur, MPLS FRR may provide only partial protection, unable to protect LSP where the fault affects both the primary path and protection path. In this case recovery relies entirely on restoration. In either cases of single fault or multiple fault, restoration is used to return to a close to optimal traffic layout and restore protection to LSP for which protection is needed. Restoration times can be long if care is not taken in network design or when using implementations which have not been optimized for very fast restoration. 5.2. Time Complexity of TE Restoration MPLS TE relies heavily on fast restoration, though fast restoration is not easily achieved. Only the best implementations acheive restoration times in the low number of seconds for regions of MPLS TE deployment employing a full mesh of LSP with a hundred to a few hundred nodes. The reason that acheiving fast MPLS TE relies restoration is difficult is the computational complexity of TE restoration. This is discussed in Section 3.2. Providers can subdivide their topology into multiple interconnected regions with a full mesh of LSP within each region, greatly improving convergence time, even if the topology is served by a single OSPF area or IS-IS level. For example, an OSPF area of 1,000 nodes can be roughly subdivided into 10 regions of about 100 nodes, plus a core which interconnects the 10 regions using two or more LSR from each region and where the core may have well under 100 nodes. Being ten times smaller, the convergence in each region can be expected to be more than a hundred times faster. If edge-to-edge LSP are needed, as they often are, either LDP can be run over the RSVP-TE LSP or RSVP-TE hierarchy [RFC4206] can be used. Further compounding MPLS TE convergence time problems is the possibility of crankback. Generally proposals in IETF which give an ingress LSR too little information to compute a path and rely too heavily on crankback are rejected for that reason. However in a situation where a fault has occurred and multiple ingress LSR may try to make use of a next best resource, crankback is likely to occur if the resource is too limited to accommodate all of the LSP setups. The impact of crankback can be reduced by implementations which sequence LSP setups, setting up the larger LSP first and delaying slighly those LSP setup for which a viable path still exists. The amount of crankback that occurs can be greatly reduced by these small delay between non-urgent LSP setup requests, prompt origination of LSDB changes particularly for links which are approaching capacity Villamizar Expires September 15, 2014 [Page 24] Internet-Draft Routing and Load Balance Stability March 2014 depletion, and quick reflooding. This short list only touches on the types of optimizations that have been implemented. A full treatment of the topic of MPLS-TE optimizations is beyond the scope of this document. 5.3. Scaling issues The most common scaling problem in RSVP-TE is due to the CSPF computational complexity described in Section 3.2 and Section 5.2. A less common problem is overload of resources related to flooding similar to the scaling problems described in Section 4.5.1 but potentially made more severe by overly aggressive origination of LSDB changes, in the worst case sending an LSDB change for each LSP setup. The scaling limitations of RSVP-TE are dramatically impacted by network topology. Providers can cause scaling problems through poor network design. The result of poor network design will be highly excessive convergence times. MPLS regions supporting a full mesh of LSP should generally not exceed 100-200 nodes, and keeping well under 100 where possible is advisable. Building a RSVP-TE full mesh over a lower layer full mesh overlay (for example, a layer-2 overlay) will cause scaling problems related to flooding. If the overlay is large, CSPF related flooding problems will also occur. 5.4. Delay, Jitter, and Loss Metrics The problems with link metrics based on delay, jitter, and loss are not as severe with MPLS TE as they are with link state (LS) protocols (see Section 3.5 and Section 4.7) if individual LSP are slow to move relative to readvertisement. One or perhaps a few of many LSP may change a path to avoid a congested link, unlike LS protocols where a change in metric causes all traffic for which another path has become a lower cost to move. Where there is only a single LSP from any ingress to egress the problems of coarse granularity of traffic adjustment remains. This is discussed further in Section 7. 6. Multipath Load Balancing Multipath load balancing is briefly introduced in Section 2.6. The minimal requirement for multipath load balancing is that packet order is maintained within any given microflow, as called for in the diffserv architecture [RFC2475]. MPLS-TP adds additional requirements that make it necessary to treat an MPLS-TP LSP as a microflow. Carrying MPLS-TP LSPs within an MPLS LSP is discussed in [I-D.ietf-mpls-multipath-use]. Villamizar Expires September 15, 2014 [Page 25] Internet-Draft Routing and Load Balance Stability March 2014 6.1. Existing Multipath Techniques Common practice today involves static multipath load balancing based on the assumption that the hash that is performed on incoming traffic will evenly distribute the traffic over the hash space. For the case where a very large number of small microflows exists, this is a valid assumption and under 5% imbalance is common. In static load balancing a hash is performed over the input traffic and the hash space is divided proportionally to the capacity of component links. [I-D.ietf-rtgwg-cl-requirement] uses the term flow rather than microflow to acknowledge that the hash used in load balancing quite often is not based on the defined characteristics of an IP microflow in [RFC2475]. Often only label stack information is used, which only works well if entropy labels [RFC6790] or PW flow labels [RFC6391] are used. Using a hash does not generally identify single flows but rather identifies sets of flows which all hash to the same value. Static load balance has an advantage of being simple and also being inherently stable. When a small number of very large flows exist, usually mixed with a very large number of very small flows, static load balancing will perform poorly. These large flows may be due to large PW flows, high capacity IPsec gateways or other applications. Dynamic load balancing can accomodate these large flows but has the potential to be unstable. Dynamic load balance stability is discused further in Section 6.2 and Section 7. 6.2. Dynamic Load Balancing Feedback Loop In dynamic load balancing the same problem that exists for failed prior techniques [DBSPRA] but with significant differences. In both cases a reaction to excessive load moves traffic to another part of the network and in both cases moving traffic away from a congested part of the network changes the parameters being meaasured. There are differences between dynamic load balancing used in multipath and the load shifts in delay based link metrics in a LS protocol. These differences make it possible to ensure stability in dynamic load balancing where stability may not be possible for delay based link metrics in a LS protocol. Briefly some differences between delay based link metrics in a LS protocol and dynamic load balancing are the following. 1. In a LS routing protocol a change to a link metric moves a lot of traffic. In dynamic load balancing based on hashing, the amount of traffic that can be moved at a time can be as little as the traffic in one hash bucket. Villamizar Expires September 15, 2014 [Page 26] Internet-Draft Routing and Load Balance Stability March 2014 2. In dynamic load balancing it is possible to determine the amount of traffic that will be moved before making an adjustment if counters are available in the load balancing hardware. 3. When dynamic load balancing is used solely in local decisions, the measurement and adjustment is made within the same router. No protocol exchange outside the router is required. Stability in dynamic load balancing when used in local decisions has been successfully deployed and is more easily provably stable than dynamic load balancing used in a distributed decision. See Section 7.4. The topic of preventing feedback problems in dynamic load balancing is discussed further in Section 7. 7. Summary of Factors in Load Balancing Stability Delay, jitter, loss, and link utilization based link metrics used in LS protocols Section 4 or used in MPLS TE Section 5 and multipath load balancing Section 6 are all forms of load balancing. All of these forms of load balancing can be categorized in term of the following characteristics. static or dynamic Static load balancing is inherently stable. Dynamic load balancing is not inherently stable but neither is it inherently unstable. balance granularity Dynamic load balancing granularity may be uncontrolled such as LS routing protocols or controlled. A controlled load balancing granularity may be very coarse such as MPLS TE where entire LSP are moved, or fine such as hash based multipath techniques. adjustment certainty The amount of traffic moved in an adjustment to dynamic load balancing may be unknown or known. For example, a link metric change in a LS protocol forces a load adjustment but the magnitude of the adjustment is unknown until after the adjustment is made. When moving an LSP, the amount of traffic moved is known. When an adjustment is made in a hash based technique, the amount of traffic moved is unknown if there is no measurement available to determine the amount of traffic falling within a group of hash values, but can be known if counters have been provided in the forwarding hardware. Villamizar Expires September 15, 2014 [Page 27] Internet-Draft Routing and Load Balance Stability March 2014 measurement compatibility The units of measurement and the units of load balance traffic adjustment may not be entirely compatible. For example, if delay, jitter, and loss are measured at the point of congestion and the measurements at the point of traffic adjustment are offerred traffic load, it is hard to predict how movement of traffic measured in Mb/s or Gb/s will impact delay, jitter, or loss, particularly given the behavior of TCP congestion avoidance. decision locality Measurements may be made locally and traffic adjusments may be applied locally in which case no protocol extensions are needed. Prior implementations of dynamic multipath operated in this way and were deployed successfully. Measurements may also be made and advertised by a LS protocol such that decisions made by numerous other nodes affect the traffic at the links being measured. Stability is more difficult to ensure with this type of distributed load balance decision. timing and traffic admustment Any dynamic load balancing technique has timing factors that can impact stability. For example, the frequency of measurements and any filtering or hysteresis applied to measurement can be dictated by specifications or made advisory and modified in specific deployments. The frequency of adjustments to load balance and changes to the adjustment amounts can also be dictated by specifications or made advisory and modified in specific deployments. Each of the above characteristics affect whether a specific load balancing technique is stable. Since static load balancing is inherently stable, the remainder of this section focuses on dynamic load balancing. The following subsections focuses on the characteristics listed above. Section 7.6 discusses the interaction of these characteristics and discusses factors in protocol design which affect the tendency toward feedback problems and potential instability. 7.1. Load Balance Granularity Villamizar Expires September 15, 2014 [Page 28] Internet-Draft Routing and Load Balance Stability March 2014 It is not possible to ensure stability in a technique which uses uncontrolled load balancing, such as LS routing protocols. It may be possible to introduce timing changes or other changes which tend toward stability or reduce the impact somewhat (such as long timers). This seemed to be a goal of [DBSPRA]. With no way to ensure stability and very limited success in mitigating it, prior efforts to load balance in LS routing protocols were abandoned and future attempts strongly discouraged. Load balancing can generally acheive better results with a fine granularity than a coarse or very coarse granularity. However techniques which use a coarse granularity can be made stable. MPLS TE is inherently a load balancing technique which is somewhat static by virtue of the normally configured bandwidth reservations and with the coarse granularity of any given LSP bandwidth reservation. Stability is ensured if LSP remain on their current path even if better paths become available, subject to relatively long timers preferably with timer jitter (timer jitter is a random component included in the timer value). Hash based load balancing is a fine granularity load balancing with the granularity determined by the number of usable bits in the hash. For example, entropy labels [RFC6790] or PW flow labels [RFC6391] are limited to 20 bits of usable hash. This allows dividing traffic into potentially a million groups of flows. Most implementations today make use of tables which are much smaller than 20 bits and therefore provide a somewhat more coarse granularity. 7.2. Load Balance Adjustment Certainty The amount of traffic that will be moved when an adjustment to dynamic load balancing is made may be either unknown or known prior to making the adjustment. Dynamic load balancing with an unknown traffic adjustment amount has been deployed successfully. This success relies on there being a reasonably good correlation between the fraction of the hash space moved and the fraction of total traffic which falls into that hash space. When the traffic adjustment amount is unknown, very large flows occasionally cause a large change in traffic when only a small change was intended. In such cases a single hash value may have a disproportionately large amount of traffic. A pathologic behavior would be to continuously move a small range of hash values back and forth as a result of this. This problem is avoidable. Villamizar Expires September 15, 2014 [Page 29] Internet-Draft Routing and Load Balance Stability March 2014 A far better situation exists when the traffic adjustment amount is known. For MPLS TE LSP, the reservable bandwidth is known and most LSR can measure traffic on a per LSP basis. For hash base multipath few existing implementations have measurements over the hash space for a particular multipath. Making such measurement is not difficult but requires that the designer of the forwarding hardware had the forsight to provide counters in the appropriate places. 7.3. Load Balance Measurement Compatibility Measurements of delay, jitter, and loss at a point of congestion cannot be readily translated into measurements of utilization at a point where an adjustment to traffic load balance is made. For example, one or more large TCP microflows may be in the TCP slow start phase, where traffic flow increases exponentially until a loss or ECN notification is detected. Prior to congestion, rising utilization can be measured directly. After the onset of congestion, utilization hits 100% and can no longer be used as a measure of loading at the point of congestion. The amount of rise in delay and jitter or the rate of rise in delay cannot easily be used to determine the amount traffic to move. If loss is occurring, the amount of loss cannot easily be be used to determine the amount traffic to move but increasing loss may be correlated to the degree of overutilization. The measurement of delay and jitter is a measure of time, generally measured in milliseconds (msec). Loss is measured as a fraction of traffic. There is no reliable means to determine how much traffic should be moved from one link with a given queuing delay to another link with a different amount of queuing delay. The same applies to two links with different amounts of loss or one link with queuing delay but no loss and another link with loss. Jitter is the most problematic of these measures. Jitter is a function both of the link utilization and the burstiness of traffic sources. A rise in jitter can occur at very low link utilizations. Jitter has never been used as the basis of load balancing and questions remain about the feasibility of doing so. 7.4. Load Balancing Decision Locality Delay, jitter, and loss based link metrics used in LS protocols Section 4 or used in MPLS TE Section 5 and multipath load balancing Section 6 are all forms of load balancing. Load balancing can be applied in two ways. Villamizar Expires September 15, 2014 [Page 30] Internet-Draft Routing and Load Balance Stability March 2014 1. Some types of load balancing can be applied locally, using only measurement made at local interfaces to impact the proportion of traffic going out those local interfaces. 2. Load balancing can be applied in a distributed fashion where measurements are advertised, usually using additional information carried by a LS protocol and action is taken by many routers which make use of the resources for which measurements have been made. Some techniques are inherently local or inherently distributed. Delay, jitter, and loss based link metrics used in LS protocols Section 4 or used in MPLS TE Section 5 are inherently distributed. Ethernet Link Aggregation [IEEE-802.1AX] is a multipath technique which is inherently local. Other forms of multipath such as MPLS Link Bundling [RFC4201] or any multipath technique which can make use of MPLS LSP as component links are neither inherently local or inherently distributed. 7.4.1. Local Decision Load Balance It is easier to ensure stability when load balancing is entirely a local decision. There is no protocol exchange required to do this sort of load balancing and there are no interoperability issues with neighbor routers. Therefore there has been no need for IETF to dictate how this is done. Common practice has evolved which is well known within the industry, widely deployed, and very similar across implementations. It has remained poorly documented in the IETF, being a local decision. Load balancing in Ethernet Link Aggregation [IEEE-802.1AX] is inherently a local decision. As commonly practiced today, ECMP and Link Bundling also make use of local decisions. 7.4.2. Distributed Load Balance Few attempts to make use of distributed load balancing have reached the IETF. A prerequisite for distributed load balancing is a means to advertise the measurement results. This prerequisite is met by [I-D.ietf-ospf-te-metric-extensions] and [I-D.ietf-isis-te-metric-extensions] but no means of applying these metrics has been defined. A very early attempt to make use of distributed load balancing was OSPF-OMP, ISIS-OMP, and MPLS-OMP [I-D.ietf-ospf-omp] [I-D.ietf-isis-omp] [I-D.villamizar-mpls-omp]. These drafts were abandonned due to provider lack of interest at the time. This was Villamizar Expires September 15, 2014 [Page 31] Internet-Draft Routing and Load Balance Stability March 2014 partially due to the difficulty in providing mathematical proof of stability. There has been a renewed though cautious interest in distributed load balancing as evidenced by [I-D.ietf-rtgwg-cl-requirement] and related work. Any use of [I-D.ietf-ospf-te-metric-extensions] and [I-D.ietf-isis-te-metric-extensions] are distributed load balancing techniques. With these extensions with measurements based on delay may optionally include queuing delay. Measurements of jitter are inherently queuing based. Measurements of loss are also queuing based except for lossy links such as wireless links. The authors of these documents have intentionally put stability concerns out of scope due to the difficulty in ensuring stability and even greater difficulty of mathematically proving stability. 7.5. Load Balance Timing and Traffic Adjustment In any system with feedback careful attention to timing is needed to ensure that the system remains stable. For load balancing this may be easier to accomplish where measurements and traffic adjustments are entirely local. The following timing considerations are related to making measurements and propogating measurement results: 1. Utilization, delay, jitter, and loss measurement interval for a direct measurement. 2. Type of filtering of direct measurements, if any, applied to produce a reportable measurement. 3. Amount of positive or negative change to reportable measurement needed before updating the reportable measurement. This amount should be a function of the time since the last reported measurement and a function of the importance of making an adjustment at that measurement level. 4. The time necessary to propogate changes must be considered when trying to determine if a technique is stable. This may be a small but non-zero time. Upon receiving an updated measurement result, any traffic adjustment should be deferred for a target time period with jitter applied to the timer. The target time period should be a function of the following: 1. the percent contribution to the traffic at the link or path where traffic would be moved from, Villamizar Expires September 15, 2014 [Page 32] Internet-Draft Routing and Load Balance Stability March 2014 2. the difference between the conditions at the link or path from which traffic would be moved from and the link or path where the traffic would be moved to, 3. any other factor that the protocol specification deems relevent or that the implementation deems relevant and that is allowable by the protocol. Upon receiving an updated measurement result, any traffic adjustment amount should be a function of the following: 1. the amount of traffic that needs to be moved from all traffic sources to acheive balance, 2. the amount of traffic that can be moved from all traffic sources to acheive balance, 3. the percent contribution to the traffic at the link or path where traffic would be moved from, 4. the difference between the conditions at the link or path from which traffic would be moved from and the link or path where the traffic would be moved to, 5. any other factor that the protocol specification deems relevent or that the implementation deems relevant and that is allowable by the protocol, 6. any prior traffic adjustment between the set of links or paths and the time that has elapsed since that adjustment. It is important to note that for a given update there may be multiple more lightly loaded links or paths for which traffic can be moved to and there may be multiple more heavily load links or paths for which traffic can be moved from. The amount of traffic for which there is already a scheduled adjustment from or to a given link or path must be taken into consideration when computing the amount of a new traffic adjustment. 7.6. Preventing Feedback Problems The goal of this section is not to produce an algorithm for traffic adjustments. The goal is simply to point out factors that will either increase or decrease the tendency toward feedback problems and potential instability. The most severe problem is positive feedback at some frequency (or time period) causing a growing oscillation until saturation is Villamizar Expires September 15, 2014 [Page 33] Internet-Draft Routing and Load Balance Stability March 2014 reached. Also very sever is an undamped oscillation, which may not grow but will continue forever. Oscillations often occur due to overshoot in making an adjustment. Overshoot can be caused in a distributed load balancing environment by multiple traffic sources each making an adjustment that individually would bring the network closer to balance but together cause overshoot. This is why the contribution of a given traffic source to the total traffic must be considered. If there is a no tendency for overshoot at all, there is a tendency to have no oscillation at all. If in the event of overshoot, subsequent adjustments are always less aggresive, then there will be a damped oscillation, known as a "ringing". Attempts to respond quickly to a traffic imbalance may result in some overshoot. Damping subsequent responses will reduce subsequent overshoot and reduce ringing. Small overshoot and damped ringing is undesirable but might be acceptable if there was some other gain such as faster reaction time. What follows are some suggestions to help ensure stability and improve damping. 1. If there was a prior traffic adjustment between two links or paths, then the following type of bounds should be applied. a. If the new traffic adjustment is in the same direction, the adjustment amount should be no more than the prior adjustment amount multiplied by some factor (greater than one) that is determined to by the protocol to help ensure stability. For example, if the prior traffic adjustment was "recent" this factor might be two or less (a doubling in the next adjustment or less) and rise proportionately if this elapsed time was above the "recent" threshhold. b. If the new traffic adjustment is in the opposite direction, the adjustment amount should be no more than the prior adjustment amount multiplied by some fraction that is determined to by the protocol to adequately damp any ringing. For example, if the prior traffic adjustment was "recent" this factor might be one half and rise proportionately with time since last adjustment if this elapsed time was above the "recent" threshhold, reaching unity at some very long time where prior history may be less relevant. 2. When an updated measurement result and a traffic adjustment is already scheduled, the computation of timer value to be used to defer taking action and computation of adjustment amount should Villamizar Expires September 15, 2014 [Page 34] Internet-Draft Routing and Load Balance Stability March 2014 be repeated. The updated adjustment amount should be used. The lesser of the remaining time on the prior timer and the new computed timer value should be used. 3. Analysis of traffic adjustment scheduled time and traffic adjustment amount should be repeated each time there is a change to either side of an already scheduled traffic adjustment. When traffic adjustment is strictly local, this should not impose a high computational burden. For distributed load balancing, the computational burden in a large mesh network will have to be considered when considering protocol timer default value guidance and safe bounds of timer values. It is very difficult to mathematically prove that the specifications for a particular dynamic load balancing method is stable. So far, at best an argument can be made that specific provisions within a particular dynamic load balancing method strongly tend toward stability. See Section 8 for further discussion. 8. Open Area for Research So far there is no mathematic proof of stability for any given dynamic load balancing method. Having such a proof would go a long way toward encouraging deployment of new methods. Absent a mathematic proof of stability, some dynamic load balancing methods for which a strong argument could be made for a tendency toward stability have been deployed. Once deployed, successful deployments provide an existance proof that networks can be stable, but this existance proof falls short of ensuring that all deployments will always be stable. The discrte nature of traffic adjustment events rather than continuous small adjustment prevent the use of linear systems analysis. This is unfortunate since stability proofs for linear systems (or proof of instability) are relatively simple. The mesh topology and analog nature of traffic adjustment amount make it difficult or impossible to apply techniques based on discrete states, such as Markov chain. Proof of stability may be easier for dynamic load balancing applied as a local decision rather than a distributed decision (see Section 7.4). For a local decision, a set of inputs to a node are balanced over a set of outputs. Absent routing loops, the output will not affect the inputs. Villamizar Expires September 15, 2014 [Page 35] Internet-Draft Routing and Load Balance Stability March 2014 For dynamic load balancing with distributed decisions a proof must be applicable to an arbitrary mesh topology. This is a more difficult problem than the problem of local decision. There may be characteristics of the mesh that make a proof for dynamic load balancing with distributed decisions somewhat less difficult. It may help that there exists a non-cylic relationship between move heavily loaded links and less loaded links, where given that all routers have the same information, traffic is only moved from more heavily loaded links to less loaded links. Other characteristics, such as non-cyclic nature of the traffic flow, may help as well. The problems of providing stability proofs for existing and proposed mechanisms are clearly difficult and not yet solved. As such this remains a good topic for researh. 9. Acknowledgements While there was interest in OMP in the 1990s there was some discussion of stability on the OSPF, ISIS, and MPLS WG mailing lists as well as off list. Particularly helpful was discussion with Matt Mathis (PSC) and Teunis Ott (Bellcore). 10. IANA Considerations This memo includes no request to IANA. 11. Security Considerations This document discusses both the known stability or instability of previously deployed routing and load balancing techniques and the unknown stability or instability of techniques which may be deployed in the future. The difficulty in proving stability is also discussed. Given the scope of the document, with no proposal for new protocols and only discussion of factors that impact stability, there are no security concerns raised by this document. 12. Informative References [DBSPRA] Bertsekas, D., "Dynamic Behavior of Shortest Path Routing Algorithms for Communication Networks", IEEE Trans. Auto. Control 1982, 1982. [I-D.ietf-isis-omp] Villamizar, C. and T. Li, "IS-IS Optimized Multipath (ISIS-OMP)", draft-ietf-isis-omp-01 (work in progress), February 1999. Villamizar Expires September 15, 2014 [Page 36] Internet-Draft Routing and Load Balance Stability March 2014 [I-D.ietf-isis-te-metric-extensions] Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas, A., Filsfils, C., and W. Wu, "IS-IS Traffic Engineering (TE) Metric Extensions", draft-ietf-isis-te-metric- extensions-01 (work in progress), October 2013. [I-D.ietf-mpls-forwarding] Villamizar, C., Kompella, K., Amante, S., Malis, A., and C. Pignataro, "MPLS Forwarding Compliance and Performance Requirements", draft-ietf-mpls-forwarding-09 (work in progress), March 2014. [I-D.ietf-mpls-multipath-use] Villamizar, C., "Use of Multipath with MPLS and MPLS-TP", draft-ietf-mpls-multipath-use-04 (work in progress), January 2014. [I-D.ietf-mpls-te-express-path] Atlas, A., Drake, J., Giacalone, S., Ward, D., Previdi, S., and C. Filsfils, "Performance-based Path Selection for Explicitly Routed LSPs using TE Metric Extensions", draft- ietf-mpls-te-express-path-00 (work in progress), October 2013. [I-D.ietf-ospf-omp] Villamizar, C., "OSPF Optimized Multipath (OSPF-OMP)", draft-ietf-ospf-omp-02 (work in progress), February 1999. [I-D.ietf-ospf-te-metric-extensions] Giacalone, S., Ward, D., Drake, J., Atlas, A., and S. Previdi, "OSPF Traffic Engineering (TE) Metric Extensions", draft-ietf-ospf-te-metric-extensions-05 (work in progress), December 2013. [I-D.ietf-rtgwg-cl-requirement] Villamizar, C., McDysan, D., Ning, S., Malis, A., and L. Yong, "Requirements for Advanced Multipath in MPLS Networks", draft-ietf-rtgwg-cl-requirement-16 (work in progress), February 2014. [I-D.ietf-rtgwg-cl-use-cases] Ning, S., Malis, A., McDysan, D., Yong, L., and C. Villamizar, "Advanced Multipath Use Cases and Design Considerations", draft-ietf-rtgwg-cl-use-cases-05 (work in progress), November 2013. [I-D.ietf-rtgwg-mrt-frr-architecture] Villamizar Expires September 15, 2014 [Page 37] Internet-Draft Routing and Load Balance Stability March 2014 Atlas, A., Kebler, R., Envedi, G., Csaszar, A., Tantsura, J., Konstantynowicz, M., and R. White, "An Architecture for IP/LDP Fast-Reroute Using Maximally Redundant Trees", draft-ietf-rtgwg-mrt-frr-architecture-03 (work in progress), July 2013. [I-D.ietf-rtgwg-remote-lfa] Bryant, S., Filsfils, C., Previdi, S., Shand, M., and S. Ning, "Remote LFA FRR", draft-ietf-rtgwg-remote-lfa-04 (work in progress), November 2013. [I-D.villamizar-mpls-omp] Villamizar, C., "OSPF Optimized Multipath (OSPF-OMP)", draft-villamizar-mpls-omp-01 (work in progress), February 1999. [IEEE-802.1AX] IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE Standard for Local and Metropolitan Area Networks - Link Aggregation", 2006, . [ISO.10589.1992] International Organization for Standardization, "Intermediate system to intermediate system intra-domain- routing routine information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473)", ISO Standard 10589, 1992. [RFC1058] Hedrick, C., "Routing Information Protocol", RFC 1058, June 1988. [RFC2080] Malkin, G. and R. Minnear, "RIPng for IPv6", RFC 2080, January 1997. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route Flap Damping", RFC 2439, November 1998. [RFC2453] Malkin, G., "RIP Version 2", STD 56, RFC 2453, November 1998. [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. Villamizar Expires September 15, 2014 [Page 38] Internet-Draft Routing and Load Balance Stability March 2014 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and Multicast Next-Hop Selection", RFC 2991, November 2000. [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, November 2000. [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, December 2001. [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- Protocol Label Switching (MPLS) Support of Differentiated Services", RFC 3270, May 2002. [RFC3468] Andersson, L. and G. Swallow, "The Multiprotocol Label Switching (MPLS) Working Group decision on MPLS signaling protocols", RFC 3468, February 2003. [RFC3630] Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering (TE) Extensions to OSPF Version 2", RFC 3630, September 2003. [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 2005. [RFC4124] Le Faucheur, F., "Protocol Extensions for Support of Diffserv-aware MPLS Traffic Engineering", RFC 4124, June 2005. [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. [RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP) Hierarchy with Generalized Multi-Protocol Label Switching (GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005. [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan", BCP 122, RFC 4632, August 2006. [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP Specification", RFC 5036, October 2007. [RFC5250] Berger, L., Bryskin, I., Zinin, A., and R. Coltun, "The OSPF Opaque LSA Option", RFC 5250, July 2008. Villamizar Expires September 15, 2014 [Page 39] Internet-Draft Routing and Load Balance Stability March 2014 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, September 2008. [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic Engineering", RFC 5305, October 2008. [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, October 2008. [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF for IPv6", RFC 5340, July 2008. [RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching (MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic Class" Field", RFC 5462, February 2009. [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC 5714, January 2010. [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free Convergence", RFC 5715, January 2010. [RFC6126] Chroboczek, J., "The Babel Routing Protocol", RFC 6126, April 2011. [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, J., and S. Amante, "Flow-Aware Transport of Pseudowires over an MPLS Packet Switched Network", RFC 6391, November 2011. [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and L. Yong, "The Use of Entropy Labels in MPLS Forwarding", RFC 6790, November 2012. [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., Francois, P., and O. Bonaventure, "Framework for Loop-Free Convergence Using the Ordered Forwarding Information Base (oFIB) Approach", RFC 6976, July 2013. [RFC6981] Bryant, S., Previdi, S., and M. Shand, "A Framework for IP and MPLS Fast Reroute Using Not-Via Addresses", RFC 6981, August 2013. Villamizar Expires September 15, 2014 [Page 40] Internet-Draft Routing and Load Balance Stability March 2014 Author's Address Curtis Villamizar Outer Cape Cod Network Consulting, LLC Email: curtis@occnc.com Villamizar Expires September 15, 2014 [Page 41]