Internet Engineering Task Force (IETF) E. Chen Internet Draft P. Mohapatra Intended Status: Informational Cisco Systems Expiration Date: March 18, 2013 September 17, 2012 Reduction of BGP Alternate Paths from Inter-Exchange Points draft-chen-bgp-path-reduction-00.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on March 18, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. draft-chen-bgp-path-reduction-00.txt [Page 1] Internet Draft draft-chen-bgp-path-reduction-00.txt Sept 17, 2012 Abstract In this document we present a mechanism that enhances the "IGP-metric based MED" approach so that load balancing is maintained while limiting the number of BGP alternate paths carried in a network. The mechanism involves the use of a "scale factor" to scale down the IGP metrics for the purpose of setting MEDs, and is thus termed "scaled IGP-metric based MED". 1. Introduction The BGP sessions [RFC4271] between service providers are typically established and maintained at multiple inter-exchange points for the purpose of routing redundancy, load balancing, and traffic localization. For a particular prefix (i.e., destination) there would exist multiple routes with different nexthops (corresponding to different peering points) in a network. Given the large number of routes, and the number of inter-exchange points, the challenge is to utilize these peering connections efficiently while maintaining operational simplicity. One common practice is to direct packets received on an ingress router to the "closest" (in terms of the IGP metric to the nexthop) inter-exchange points. Thus from the network point of view, traffic destined to a particular prefix would be distributed among the inter- exchange points from which the routes for the prefix are received. The scheme is commonly known as "shortest-exit routing", or "hot- potato routing". As described in [RFC4271], the first few steps in BGP route selection [RFC4271] involve the comparisons of the LOCAL_PREF value, the AS- PATH length, the MED value, and the IGP metrics for the nexthop. Given that the AS-PATH length is typically network-topology dependent and agnostic to the peering locations, a common implementation of the "shortest-exit routing" is to set the LOCAL_PREF value and the MED value to a constant value, respectively, for the routes received from all these peering points, thus using the IGP metrics as the tie- breaker in the BGP route selection. This scheme offers fast routing convergence, consumes minimal network bandwidth for a particular network, and requires little coordination and cooperation between providers. However, the number of alternate paths carried in a network, in particular on the route reflectors [RFC4456], grows linearly to the number of peering locations. A large number of alternate paths from the peering locations could become a scaling issue as described in [NANOG46]. draft-chen-bgp-path-reduction-00.txt [Page 2] Internet Draft draft-chen-bgp-path-reduction-00.txt Sept 17, 2012 Clearly one alternative is for the service providers to use the IGP metric as the MED in route advertisement, and accept the MEDs from each other. This approach (hereby referred as "IGP metric based MED") is straightforward both technically and operationally. The amount of coordination between the providers would also be minimal. However, compared with the "shortest-exit routing", the "IGP-metric based MED" approach has the drawbacks of slower routing re- convergence as only the paths with the lowest MED are readily available in the network. In addition, only one peering point may be used for traffic to a given destination, which may potentially impact load balancing across all peering locations. In this document we present a mechanism that enhances the "IGP-metric based MED" approach so that load balancing is maintained while limiting the number of BGP alternate paths carried in a network. The mechanism involves the use of a "scale factor" to scale down the IGP metrics for the purpose of setting MEDs, and is thus termed "scaled IGP-metric based MED". 2. Scaled IGP-Metric Based MED The "Scaled IGP-Metric based MED" approach consists of the following procedures: o Conceptually divide the network and the inter-exchange points with a particular provider into multiple topological regions. There should usually be more than one inter-exchange points in a region so that the traffic destined toward that region will be load balanced across the inter-exchange points within that region. o For a route sourced (either internally or received from EBGP peers) within a region, advertise it with an identical MED across all the BGP sessions with that provider in the region; and advertise it with less preferred MEDs across BGP sessions with that provider in other regions. Note that only the paths with more preferred MEDs are carried in the network, and the external paths with less preferred MEDs would not be further advertised by the peering routers to the internal peers. The number of alternate paths (for a prefix) carried by the other provider that accepts such MEDs will be controlled, roughly, by the number of BGP sessions inside the region that sources the prefix. Due to the presence of more than one path in the network, the fast routing re-convergence will be maintained. In addition, multiple peering locations will be used for traffic destined to the prefix. draft-chen-bgp-path-reduction-00.txt [Page 3] Internet Draft draft-chen-bgp-path-reduction-00.txt Sept 17, 2012 Operationally this approach can be easily implemented by setting the MED based on a scaled IGP metric (i.e., divide the IGP metric by a scale factor). The scale factor can be set as one plus the maximum IGP metric (or diameter) between the peering routers and other routers within the region. Clearly the "scale factor" implementation would work better in a network with differentiated IGP metric values for the "inter-regional" links vs "intra-regional" links. This approach can also be implemented by attaching "location communities" for routes sourced from different locations, and then setting the MED based on the communities. It also noted that both the "shortest-exit routing" and the "IGP metric based MED" schemes can be considered as special cases of this "scaled IGP-metric based MED" scheme with the scale factor being the largest 32-bit unsigned integer, and 1, respectively. 3. Example In the following figure, A1, A2, A3, and A4 are the peering routers in SP1; P1, P2, P3, P4 are prefixes/routes sourced at A1, A2, A3, A4 respectively. They are advertised at all the peering points. B1, B2, B3, and B4 and the peering routers in SP2; RR1, RR2, RR3, RR4 are route reflectors in different clusters in SP2. The numerical number above a dotted line is the IGP metrics assigned to the link. (P1) (P2) (P3) (P4) | | | | | 10 | 30 | 20 | A1 -------------- A2 ----------------- A3 ------------- A4 | | | | | | | | | | | | B1 -------------- B2 ----------------- B3 ------------- B4 | | | | | | | | RR1 RR2 RR3 RR4 | | | | To use the "scaled IGP-metric based MED" scheme, SP1 can conceptually organize the networks into two regions, one with the inter-exchange draft-chen-bgp-path-reduction-00.txt [Page 4] Internet Draft draft-chen-bgp-path-reduction-00.txt Sept 17, 2012 peerings of A1/B1 and A2/B2, and another with A3/B3 and A4/B4. The scale factor for the first region would be 11, and the scale factor for the second region would be 21. As a result, the number of alternate paths would be 2 on the route reflectors in SP2 for each of the prefixes/routes P1 - P4. 4. IANA Considerations This document requires no action from the IANA. 5. Security Considerations This document does not introduce any new security issues. 6. Acknowledgments We would like to thank Eric Rosen and Saikat Ray for their review and suggestions. 7. References 7.1. Normative References [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, April 2006. 7.2. Informative References [NANOG46] McPherson, D., S. Amante, and L. Zhang, "BGP Scalability Considerations - The Intra-domain BGP Scaling Problem", NANOG-46, June 2009. draft-chen-bgp-path-reduction-00.txt [Page 5] Internet Draft draft-chen-bgp-path-reduction-00.txt Sept 17, 2012 8. Authors' Addresses Enke Chen Cisco Systems, Inc. Email: enkechen@cisco.com Pradosh Mohapatra Cisco Systems, Inc. Email: pmohapat@cisco.com draft-chen-bgp-path-reduction-00.txt [Page 6]