Traffic Engineering Working Group Vanessa Springer Internet Draft Craig Pierantozzi Expiration Date: February 2001 Jim Boyle August 2000 Level3 MPLS Protocol Architecture draft-springer-te-level3bcp-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Springer, et al. [Page 1] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 Abstract This paper discusses Traffic Engineering with Multi-Protocol Label Switching in the Level3 network. A brief overview of traffic engineering is given followed by constraints affecting Level3's design. The approach Level3 will use, which is LDP edge and an RSVP-TE core, is presented. Several architectures were considered when deciding upon a design. These methods are discussed as well as the reasons they were ultimately refuted. Table of Contents 1 Specification of Requirements .......................... 2 2 Introduction ........................................... 2 3 Design Constraints ..................................... 3 4 MPLS Architecture Chosen ............................... 4 5 Other Architectures .................................... 6 6 Systems Issues ......................................... 7 7 Conclusion ............................................. 8 8 Author Information ..................................... 8 1. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 2. Introduction The task of mapping traffic flows onto an existing topology so as to optimize resource utilization and network performance is called Traffic Engineering [TE]. Optimization refers to finding the minimum or maximum of a given function. An ideal function would allow for minimizing the maximum utilization in a network. However, this would require computing paths for demands offline and configuring these down into the network elements. This also would not allow proper response to different failure scenarios. TE can also be used more loosely to refer to networks that associate bandwidth with point to point macro-flows in order to try to avoid over-utilization of resources. Springer, et al. [Page 2] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 Without TE, traffic follows the shortest path calculated by the IGP. Doing so conserves network resources, but is not always the optimum path. Paths may overlap, creating congestion, while other paths are under-utilized. One solution to this problem is IGP metric manipulation. However, this can create unforeseen congestion in other parts of the network. MPLS provides a method of traffic engineering. With MPLS, traffic engineering attempts to control traffic on the network using Constrained Shortest Path First (CSPF). CSPF creates a path that is altered by restrictions when calculating a path. This allows complete control over the path the LSP will take. This path may not always be the shortest path, but will utilize paths that are less congested. This memo will be focused entirely on the use of Traffic Engineering in Level3's network. Specifically, it focuses on how Level3 will utilize MPLS to enhance network performance. 3. Design Constraints Level3 progressed from ATM due to growth necessitating a new WAN switch. When considering different protocol architecture alternatives, a few constraints were taken into consideration. These can be summarized as follows: o Cost effectiveness and switching hardware roadmap o Multiple Routing Domains o Avoid data plane hierarchy within the gateway o Edge to Edge MPLS (for Customer VPNs) o Fast Reroute The chosen technology should have broad vendor support and clear path to 10x and 100x OC192 chassis offerings. Level3 supported multiple services over a single ATM WAN. This proved very cost effective while using leased capacity. It is expected to remain cost effective for the near to mid term over owned facilities. Supporting multiple services over a common core requires the ability to separate service domains in terms of routing information, queuing on multi-service links, and forwarding isolation. It was desired to have an architecture that did not necessitate purchase of additional routing equipment to consolidate traffic from within a service, as often is the case with routers meshed over ATM architectures. Springer, et al. [Page 3] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 Support of edge to edge MPLS is required to allow for customer VPN service. It made for some interesting protocol scenarios when considering a network with on the order of 30 gateways, each with up to 8 edge routers. The last requirement is fast rerouting capabilities. This is somewhat related to having a multi-service core, but it is most strongly driven by carrying a voice service over the core. Besides obvious call quality issues, where too much drop-out can lead a caller to believe that the call has been dropped (on the order of 2-4 seconds), several of the signalling entities in the network are not very tolerant of unavailability of the network (on the order of 5-10 seconds). Level3's goal in this area was 45 ms, however 2 seconds was placed as an upper limit. 4. MPLS Architecture Chosen Based on capacity, cost, and the future of product mix, the decision was made to use POS. MPLS enhances POS networks by enabling the support of multiple services, customer VPN's, traffic engineering, and fast-reroute, which are architectural design requirements. MPLS/POS does this in a manner which is more scalable and manageable than ATM. At this time, Level3's architecture consists of an IP edge with an RSVP-TE core. The RSVP-TE backbone is fully meshed and in the US currently consist of 20 routers and will grow to around 30 by year end 2000. Plans are to rollout LDP as it becomes available in production software, or in single-vendor mode for limited VPN offerings. Limited use of Layer 2 tunneling [L2TUN] across the core is also used to carry internal networks traffic as well as to tunnel multicast IP across the non-multicast enabled core. This architecture was chosen based on its simple, scalable design, given the constraints. This model also supports MPLS edge to edge, while RSVP delivers the bandwidth management and reconvergence speeds that are required. Besides somewhat Level3 unique requirements, such as fast reroute, the decision to add the complexity of RSVP-TE came along with other more traditional advantages. The Level3 data network currently utilizes significant leased facility. These can include parallel paths provided by alternate carriers. Loss of 1/3 or 1/2 of the capacity on a given span doesn't necessarily change the overall nodal path traffic would take via SPF. CSPF allows for only the traffic that can fit in the remaining capacity to remain along the SPF, while other demands are routed along longer, less utilized paths. This is Springer, et al. [Page 4] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 especially useful when the overall topology becomes more meshed and multiple alternate paths must be utilized effectively. LDP is a method by which routers inform others of the label to use to forward traffic through them. It is useful in architectures, such as our own, that require efficient hop-by-hop routed tunnels, such as VPN, and tunneling between BGP edge routers [LDPA]. LDP allows for flexibility as to when it advertises label bindings, and strategies for retained learned labels and label distribution. Level3 intends to use LDP in a downstream unsolicited, ordered control with liberal label retention mode. LDP allows for minimal configurations in one gateway when a router is added to another. The only configuration needed would be in the gateway of which the router was added. Because of the feature that allows LSR's to discover LDP peers, configuration of LDP peers would also be minimal. As LDP is hop-by-hop, only adjacent peers must be configured, which is usually the case for IP configuration anyways. The only network wide change would be in the case of additions to the top-level IBGP mesh, which can also be minimized by the uses of BGP route reflection. A multiservice network requires the ability to separate the different services. In particular it is important to isolate the traffic in terms of routing domains, queuing and forwarding. The different core services supported by the core network include Internet platform, voice and internal networks. In order to protect the non Internet services from potential global routing instability, it was decided to avoid running BGP in the core of the network to minimize potential CPU impact of such instability. Queuing is done via MPLS COS bits, which are marked at imposition to note which service the traffic belongs to. Firewalls, and constrained control information insure that traffic cannot be forwarded from one service to another. LDP again allows lack of BGP information in the core, while maintaining an edge to edge forwarding LSP between BGP ingress and ingress points in the Internet platform. Removal of BGP entries from the core will also speed updates of forwarding state on topology changes. It should be noted that LDP must be run over, or in parallel with, the RSVP-TE LSPs. This allows an WAN router at the egress of an RSVP LSP to split out traffic to multiple adjacent MPLS routers within the site and region it serves from one RSVP-TE LSP. It also requires two levels of label stacking to carry Internet traffic across the core, or potentially three for VPN traffic. This is fairly easy to configure as the directed LDP sessions are a mesh of a limited set of routers, and have parity with the RSVP-TE LSP mesh. Presignalled backup LSP's are used to speed convergence in the Springer, et al. [Page 5] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 traffic engineered core. The TE-LSP will consist of a primary working LSP and a protect LSP. The working LSP will usually run along the shortest path while the protect LSP attempts to avoid paths in use by the working LSP. If failure of the primary LSP occurs, the LSP will switch to the protect LSP. When the primary path becomes re- established, traffic will be placed back on it. Notification of LSP failure is unfettered by LSA generation holddowns. The secondary LSP also prevents attempting to establish an LSP upon failure based on an out of date topology database. It is noted that while BGP and IP routing are full in the core (due to present lack of LDP availability), the use of RSVP LSPs localize and thus minimize black holed traffic that can arise when OSPF pulls traffic through a router before it has complete BGP information. 5. Other Architectures Other alternatives were considered before making a final protocol architecture decision. One option was IP over ATM. Historically, ATM provided higher bandwidth, higher performance and cost effective interfaces. It also decoupled topology from IP forwarding and thus allowed for the development of traffic engineering on IP networks. However, because they are two different technologies, managing the network becomes somewhat complex. Another limitation of this model is the OC-12 edge. The fastest ATM router interfaces commonly available run at OC-12 speeds. Considering the migration to OC-48 and OC-192 WAN speeds, this limitation becomes increasingly important. Besides complex mesh management, POS appeared to have a stronger and more competitive future. Our current design of an IP edge with RSVP in the core will not be adequate due to the lack of support for MPLS edge to edge. With this requirement, LDP is still necessitated. Another common approach is to do RSVP within a region and deaggregate the IP traffic to an inter-region router which is connected to an inter-region RSVP mesh [FRO]. This also does not allow for edge to edge mpls. RSVP edge to edge was not used based on scalability issues on the edge and in the core. Taking an example of 30 gateways, each with 8 edge devices, adding a router would mean configuration of 240 devices. Each would also have to have 239 RSVP sessions configured and maintained. In the core, on the order of 25,000 RSVP sessions would have to be manageable in transit on some network elements. This did not appear scalable from a protocol or manageability perspective. Springer, et al. [Page 6] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 Another approach would involve a loosely routed RSVP edge which gets further encapsulated into an RSVP core[HRSVP]. This reduces the core scalability issues mentioned above, but retains edge manageability and scalability concerns of above. At the time of evaluation (mid 1999), ways of doing RSVP in RSVP for LSP encapsulation were only just beginning to be considered. CR-LDP was not seriously considered as it too was just being developed within standards and implementations. Also, it was not planned for support by key vendors. As RSVP was already widely deployed in at least 2 major ISPs, it was expected to be the more battle hardened implementation, too. 6. Systems Issues As of June 2000, Level3's WAN configuration allows for configuration of the RSVP-TE core in a non-constrained manner. Growth trends are expected to make this no longer true by October. Systems are being developed to monitor LSP utilization and feed configurations back into the network. These systems are likely to resemble other systems in place at other large ISPs. These systems are not trivial and lend creedance to thoughts that TE are not worth pursuing unless 100% necessary. Besides programming complexity, reconfiguring a network on potentially a weekly basis is also something that could introduce service affecting outages. It is recommended that these types of reconfigurations only be undertaken when they are necessitated on a frequent basis, and thus become more operationally routine. That said, once such a system is in place, the need for metric and other routing firedrills, or tactical use of limited amounts of TE-LSPs, hopefully become less common, or perhaps even something for historical perspective. In response to warnings against extensive use of MPLS TE, it can be noted that relying on tactical techniques to avoid congestion can also be potentially service impacting. Such tactical maneuvers involve complex analysis that could perhaps be done better, routinely, by computers. Tactical fixes are also frequently left in place and forgotten, and sometimes are remembered only when a network is not behaving as expected. Springer, et al. [Page 7] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 7. Conclusion Level3 has developed a unique MPLS protocol architecture that allows a scalable edge to edge MPLS platform, which allows for both customer and core-service VPNs. RSVP-TE is used in the core to provide traditional traffic engineering advantages such as avoiding congestion during failure scenarios (and even during normalled up operation with overdue capacity augments). It also provides for fast network reconvergence. IP is currently used, and LDP will be used as available, to bring in traffic from different services from within the gateway as well as non traffic-engineered regions. The RSVP+LDP architecture can be portrayed as complex in comparison to an IP only or IP + RSVP architecture. However, it can also represented as more cost effective and easier to maintain than multiple service specific networks. References [TE] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, X. Xiao, "A Framework for Internet Traffic Engineering", draft-ietf-tewg- framework-02.txt, work in progress, May 2000. [L2TUN] L. Martini, et. al. "Layer 2 Tunneling using MPLS", draft- martini-l2circuit-trans-mpls-02.txt, work in progress, June 2000. [LDPA] B. Thomas, E. Gray, "LDP Applicability", draft-ietf-mpls-ldp- applic-02.txt, work in progress, August 2000. [FRO] "Traffic Engineering with MPLS in the Internet", IEEE Network Magazine, March/April 2000. [HRSVP] "LSP Hierarchy with MPLS TE", draft-ietf-mpls-lsp-hiearchy- 00.txt, work in progress, July 2000. 8. Author Information Vanessa Springer Level 3 Communications, LLC. 1025 Eldorado Blvd. Broomfield, CO 80021 e-mail: vanessa.springer@level3.com Springer, et al. [Page 8] Internet Draft draft-springer-te-level3bcp-00.txt August 2000 Craig Pierantozzi Level 3 Communications, LLC. 1025 Eldorado Blvd. Broomfield, CO 80021 e-mail: tozz@level3.net Jim Boyle Level 3 Communications, LLC. 1025 Eldorado Blvd. Broomfield, CO 80021 e-mail: jboyle@level3.net Springer, et al. [Page 9]