Internet Engineering Task Force Olivier Bonaventure INTERNET DRAFT FUNDP Stefaan De Cnodder Alcatel Jeffrey Haas NextHop Russ White cisco July, 2001 Expires January, 2002 Controlling the redistribution of BGP routes Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document proposes the redistribution extended community. This new well-known extended community allows a router to influence how a specific route should be redistributed towards a specified set of eBGP speakers. The redistribution community allows to indicate that a specific route should not be announced to a set of eBGP speakers, should only be announced to a set of eBGP speakers or should be prepended n times when announced to a set of eBGP speakers. 1 Introduction Bonaventure/De Cnodder/Haas/White [Page 1] draft-bonaventure-bgp-redistribution-01.txt July 2001 In today's commercial Internet, many ISPs need to have some control on their interdomain traffic. In the outgoing direction, this control can be obtained by configuring the BGP routers of the ISP to favor some routes over others by using the LOCAL-PREF attribute. However, due to the assymetry of Internet traffic, most ISPs mainly need to control their incoming traffic. +---------------+ | | | AS22 | | | +---------------+ || +---------------+ +---------------+ | 13.0.0.0/8 | | AS21 | | 12.0.0.0/8 |===============| | | AS20 | +---------------+ +---------------+ || +---------------+ | | | AS10 | | | +---------------+ Figure 1: Simple interdomain topology In the incoming direction, the only way to influence the traffic flow is to control the redistribution of its routes. Several methods exist and are used in practice [Hal97]. In this case, it needs to influence the redistribution and the selection of its own routes by remote ISPs. Since the default configuration of many BGP routers is to select the route with the smallest AS path length, a common technique is to artificially increase the length of the AS path for some announced routes. For example, in figure 1, if AS20 wanted to indicate that it prefers to receive its traffic towards subnet 13.0.0.0/8 through its link with AS22, then it would announce this prefix as usual on this link to AS22 and announce a prefix with the AS20:AS20:AS20:AS20 path to AS21 and AS10. If AS10 and AS21 rely only on the AS path length to select the best BGP route, they will prefer the shorter route received by AS22. This requires a manual configuration of the BGP routers, but path prepending is used very often on the Internet according to [Hus01]. In some cases, the configuration burden can be reduced by using the BGP communities attribute. Recently, several large ISPs have gone one step further by defining BGP communities that allow their customers to influence the Bonaventure/De Cnodder/Haas/White [Page 2] draft-bonaventure-bgp-redistribution-01.txt July 2001 redistribution of their routes. For example, in figure 1, AS20 could configure its BGP routers to always prepend four times AS20 when they announce via eBGP a route received from one of AS20's customers with a special community attribute. For this, AS20 needs to publish the specific BGP communities that it supports and its customers need to configure their router appropriately. If AS20 needs to define a new BGP community or change an existing one, it must inform all its customers would will then have to update the configuration of their routers. A quick survey of the RIPE database in May 2001 revealed that the utilization of BGP community attributes to control outbound routes is becoming more and more frequent. Several utilizations of the BGP community attributes are interesting to mention. - More than twenty different AS define their own BGP community attributes to allow their customers/peers to indicate that a particular route should not be propagated towards a specific AS, towards the routers attached to a specific IX, or towards AS within a given geographical area (e.g. a European AS could want to prohibit a route from being announced to US peers). - More than twenty different AS define their own BGP community attributes to allow their peers or customers to indicate that an announced route should be prepended when announced towards a specific AS, IX or set of AS. - Five AS define their own BGP community attribute to indicate that a given route should only be redistributed towards a specified AS. From this survey, it is clear that this utilization of the BGP communities attribute occurs in today's Internet. However, asking each AS to select its own values for the BGP communities and documenting these values in the RIPE database is not very efficient because it forces the BGP routers to be configured manually based on information found in the RIPE database or in peering agreements. Given the growing utilization of the BGP community attribute to support such facilities, we propose in this document a new type of well-known BGP extended community. By using well-known BGP extended communities with a precise syntax, we support most of the current utilizations of the BGP communities without relying unnecessarily on manual configuration of the BGP routers. We believe that reducing the manual configuration of these routers would be very useful for the stability and the performance of the global Internet. 2 Controlled redistribution of BGP routes This document defines a method to allow a BGP speaker to influence how its peers will redistribute its own routes. For this, the BGP speaker may define for each announced route a redistribution policy that controls how this route will be redistributed. This is done by defining a set of allowed or requested operations and a list of BGP Bonaventure/De Cnodder/Haas/White [Page 3] draft-bonaventure-bgp-redistribution-01.txt July 2001 speakers. The list of BGP speakers can be specified by indicating either the BGP speakers that are covered by the redistribution policy or those that are not covered by this policy. The current version of this document supports the following operations : - the attached route should not be announced to the BGP speakers cov- ered by the policy - the attached route should only be announced to the BGP speakers covered by the policy - the attached route should be announced with the NO_EXPORT attribute to the BGP speakers covered by the policy - the attached route should be prepended n times when announced to to the BGP speakers covered by the policy The redistribution policies are encoded in a special type of extended communities attribute called the redistribution community. If a redistribution policy applies to a long list of BGP speakers, then it will be encoded in several redistribution communities. 2.1 The redistribution community The extended communities attribute is defined in [RTR01]. This attribute allows a BGP router to attach a set of extended communi- ties to an UPDATE message. Each extended community value is encoded as an eight octets quantity with a two octets type field and a 6 octets value field. Several types of extended community values are defined in [RTR01]. This document proposes a new well-known extended community : the redistribution community. The redistribution community is composed of a two octets type field and a six octets value field. The two octets type field is encoded as follows. The high order octet indicated that this is a redis- tribution community. It is encoded as defined in [RTR01]. The high order bit and the transitive bits of the first octet are set to one and the 6 lower order bits of this octet are TBD_IANA. The second octet of the type field indicates the redistribution policy to apply to the specified BGP speakers for the attached route. This octet is encoded as follows : - The high and the second order bits (Bit7 and Bit6) are reserved Bonaventure/De Cnodder/Haas/White [Page 4] draft-bonaventure-bgp-redistribution-01.txt July 2001 - Bit5 is the Dist_List flag. When set to 1, it indicates that the redistribution policy modifies the the redistribution of the route to the specified BGP speakers (either by inserting NO_EXPORT or by path prepending) but not the redistribution to the non-specified BGP speakers. Otherwise, the redistribution policy prohibits the redistribution to the specified BGP speakers and may modify the route redistributed to the non-specified BGP speakers. - Bit4 is the Include/Exclude bit. When set to 1, it means that the redistribution policy applies to the listed BGP speakers. Other- wise, the redistribution policy only applies to the BGP speakers that are not listed. - Bit3 is the No_Export bit. If set to 1, it means that the NO_EXPORT community should be inserted when announcing the attached route to the BGP speakers covered by the redistribution policy. - Bits2-0 are the Prepend bits. Their value indicate how many times the AS number of the announcing router should be prepended when announcing the attached route to the BGP speakers covered by the redistribution policy. A value of 0 indicates that no prepending should occur. The 6 octets value field of the redistribution community indicates to which BGP speakers the redistribution policy applies. It is encoded as follows : - The high order octet indicates the type of the BGP speakers field. - The five low order octets are the value of this field. This document defines four types of BGP speakers fields (values 0x01-0x04). Value 0x00 is reserved and values 0x05-0x7f are to be assigned by IANA. Values larger than 0x7f are vendor specific. - The BGP speakers field contains a two octets AS number (Speakers Type 0x01) - The BGP speakers field contains two two octets AS numbers (Speakers Type 0x02) Bonaventure/De Cnodder/Haas/White [Page 5] draft-bonaventure-bgp-redistribution-01.txt July 2001 - The BGP speakers field contains a CIDR prefix/length pair (Speakers Type 0x03) - The BGP speakers field contains a four octets AS number (Speakers Type 0x04) The BGP speakers field shall be encoded as follows. If this field contains a two octet AS number, the AS number shall be placed in the two high order octets. The three low order octets shall be set to zero upon transmission and ignored upon reception. If the BGP speakers field contains two two octets AS numbers, the first AS number should be placed in the two high order octets. The second AS number should be placed in the next two octets and the last octet sent be set to zero upon transmission and ignored upon reception. If the BGP speakers field contains a four octet AS number, the AS number shall be placed in the four high order octets. The low order octet shall be set to zero upon transmission and ignored upon reception. If the BGP speakers field contains a CIDR prefix/length pair, the IP prefix shall be placed in the four high order octets and the low order octet will contain the prefix length. 3 Operations A router may, depending on its policy, add any redistribution com- munities to a route originated by itself or received from another BGP speaker with iBGP or eBGP. In practice, only the originator of the route should insert the redistribution community as it is an attempt of the route originator to do some form of inter-domain traffic engineering. The redistribution communities defined in this document are only used when a route is redistributed to an eBGP peer. They do no affect the redistribution of routes via iBGP. When a router receives a route with redistribution communities, it should apply the operations specified by these communities when redistributing the route to eBGP peers. A router should remove the received redistribution communities when redistributing the route to eBGP peers. It may however add its own redistribution communi- ties to this route before redistributing it. Two redistribution communities are said to be applicable to the same redistribution policy when their two high order octets are equal. Bonaventure/De Cnodder/Haas/White [Page 6] draft-bonaventure-bgp-redistribution-01.txt July 2001 A router should apply the policies defined by the redistribution communities to the routes that is has selected for advertisement from its Adj-RIB-OUT based on its own policy. A route that contains redistribution policies should be processed as follows. All redis- tribution communities that correspond to the same redistribution policy should be processed together by considering the type field of the redistribution communities and the list of BGP speakers that are covered by this policy. The pseudo-code below clarifies the operation : Bonaventure/De Cnodder/Haas/White [Page 7] draft-bonaventure-bgp-redistribution-01.txt July 2001 /* Extract from redistribution communities the following information for each redistribution policy */ /* Dist_List : Bit5 */ /* Include_Exclude : Bit4 */ /* No_Export : Bit3 */ /* Prepend : Bits2-0 */ /* BGP_Speakers : List of AS numbers and CIDR prefixes covered by this redistribution policy */ if ( Dist_List == 1) { if ( (Include_Exclude == 1) AND (Peer isin BGP_Speakers) ) OR ( (Include_Exclude == 0) AND not(Peer isin BGP_Speakers) ) { /* route can be announced to eBGP peer */ if (No_Export == 1) /* insert NO_EXPORT community */ if (Prepend > 0) /* Prepend own AS number */ } else { /* The route can be announced as usual to this peer */ } } else /* Dist_List == 0 */ { if ( (Include_Exclude == 1) AND (Peer isin BGP_Speakers) ) OR ( (Include_Exclude == 0) AND not(Peer isin BGP_Speakers) ) { /* The route cannot be announced to this peer */ } else { /* route can be announced to eBGP peer */ if (No_Export == 1) /* insert NO_EXPORT community */ if (Prepend > 0) /* Prepend own AS number */ } } Figure 2: Processing of the redistribution communities As some operators do not wish to allow interdomain traffic Bonaventure/De Cnodder/Haas/White [Page 8] draft-bonaventure-bgp-redistribution-01.txt July 2001 engineering on their networks contrary to local policy, an imple- mentation should provide a mechanism to ignore these communities. For implementation purposes, the two-octet AS version of the BGP_Speakers field may problematic for interim implementations since it does not easily allow an extended communities implementa- tion to simply add stuff for their particular AS in. I.e. an implementation can easily match on an unknown community on an exact basis while the 2-octets version requires to apply a mask and check on both components. It should be noted that given the flexibility of the defined redis- tribution communities, it is possible to define two conflicting redistribution communities (e.g. one indicating that this route should not be announced to ASx and the other indicating that this route should only be announced to ASx). Such cases should be avoided by the operators. If such problems occur, an implementa- tion may apply any of the conflicting redistribution communities and ignore the others. In this case, it would be useful to log the error. 4 IANA considerations This document requests the attribution of a new BGP extended commu- nities type field from IANA. 5 Security considerations Both the communities and extended communities options have the potential to introduce additional security concerns into BGP. Tra- ditional implementations allow third parties to modify (extended) communities on the routes which may bias reachability of the net- work in question by appending communities on a third-party basis according to the semantics of those communities. The redistribu- tion extended community mechanism further allows someone to mali- ciously deny reachability to AS's by proxy. When utilized by the route originator, the redistribution extended comunity may possibly be used to mitigate DDoS attacks by denying an attacking AS reachability to the network in question. This assumes the AS in question is using default-free policy and no supernets of the network in question are present in the global routing table. 6 Conclusion Bonaventure/De Cnodder/Haas/White [Page 9] draft-bonaventure-bgp-redistribution-01.txt July 2001 This document has proposed the new redistribution community. By using the defined redistribution communities, a BGP router can influence the redistribution of a given route by its peers. The proposed redistribution community is intended to replace the cur- rent widespread utilization of local BGP extended communities that relies heavily on manual router configuration. The redistribution community proposed by this document could also be useful for inter-provider VPNs such as those described in [RRB^+01]. Acknowledgements This work was partially funded by the European Commission, within the ATRIUM IST project. References [Hal97] B. Halabi. Internet Routing Architectures. Cisco Press, 1997. [Hus01] G. Huston. AS1221 BGP table statistics. available from http://www.telstra.net/ops/bgp/, 2001. [ISO93] ISO/IEC, Protocol for Exchange of Inter-domain Routeing information among Intermediate Systems to Support Forwarding of ISO 8473 PDUs, ISO/IEC 10747:1993 [RRB^+01] E. Rosen, Y. Rekther, T. Bogovic, , R. Vaidyanathan S. Brannon, M. Morrow, M. Carugi, C. Chase, L. Fang, T. Wo Chung, J. De Clercq, E. Dean, P. Hitchin, A. Smith, M. Leelanivas, D. Mar- shall, L. Martini, V. Srinivasan, and A. Vedrenne. BGP/MPLS VPNs. Internet draft draft-rosen-rfc2547bis-03.txt, work in progress, February 2001. [RTR01] S. Ramachandra, D. Tappan, and Y. Rekhter. BGP extended communities attribute. Internet draft,draft-ramachandra-bgp-ext- communities-08.txt, work in progress, January 2001. Bonaventure/De Cnodder/Haas/White [Page 10] draft-bonaventure-bgp-redistribution-01.txt July 2001 Authors' Addresses Olivier Bonaventure Infonet group (FUNDP) Rue Grandgagnage 21, B-5000 Namur, Belgium Email: Olivier.Bonaventure@info.fundp.ac.be URL : http://www.infonet.fundp.ac.be Stefaan De Cnodder Alcatel Carrier Internetworking Division Francis Wellesplein 1 B-2018 Antwerp, Belgium Email: stefaan.de_cnodder@alcatel.be Jeffrey Haas NextHop Technologies 517 Williams Ann Arbor, MI 48103-4943 Phone: +1 734 936 2095 Fax: +1 734 615-3241 Email: jhaas@nexthop.com Russ White Cisco Systems Email: ruwhite@cisco.com Bonaventure/De Cnodder/Haas/White [Page 11] draft-bonaventure-bgp-redistribution-01.txt July 2001 Appendix 1 Examples +---------------+ +-------+ +-------+ | | | | | | | AS22 |=====|AS50 |====| AS40 | | | | | | | +---------------+ +-------+ +-------+ || || +---------------+ +---------------+ | | | AS1 | | |===============| | | AS20 | +---------------+ +---------------+ || || || +---------------+ +---------------+ | | | | | AS10 R|------IX-------|R AS30 | | | 1.2.3.0/24 | | +---------------+ +---------------+ Figure 3: Simple interdomain topology To better understand the usefulness and the flexibility of the pro- posed redistribution communities, it is useful to consider a few examples. Assume the simple interdomain topology shown on figure 2. If AS30 wanted to offer to AS1 a limited transit service to reach only the AS connected at IX, then it could simply insert to the routes received from AS1 a redistribution community like : - Dist_List=0 - Include/Exclude=0 - NO_EXPORT=0 - Prepend=0 - Value = 1.2.3.0/24 With this redistribution community, the routes received from AS1 will only be announced to the eBGP speakers that are part of the 1.2.3.0/24 subnet. Assume now that AS20 agrees to provide a limited transit service. For this, AS20 wants to advertise the route receive from AS1 to all its cheap peers except its transit upstreams (e.g. AS2 and AS3 - not shown in the figure). In this case, AS20 would insert the fol- lowing redistribution community to the routes received from AS1 : Bonaventure/De Cnodder/Haas/White [Page 12] draft-bonaventure-bgp-redistribution-01.txt July 2001 - Dist_List=0 - Include/Exclude=1 - NO_EXPORT=0 - Prepend=0 - Value = AS2, AS3 AS20 could also want to provide a kind of "backup" service. For example, it would announce to its transit upstreams the routes received from AS1 has low quality routes. In this case, AS20 would insert the following redistribution community to the routes received from AS1 : - Dist_List=1 - Include/Exclude=1 - NO_EXPORT=0 - Prepend=5 - Value = AS2,AS3 If AS20 had three transit provides, AS2, AS3 and AS4 then, it would need to use two redistribution communities to encode this redistri- bution policy. Redistribution community 1 - Dist_List=1 - Include/Exclude=1 - NO_EXPORT=0 - Prepend=5 - Value = AS2,AS3 Redistribution community 2 - Dist_List=1 - Include/Exclude=1 - NO_EXPORT=0 - Prepend=5 - Value = AS4 These two redistribution communities would be processed together since they apply to the same redistribution policy. Assume that AS1 receives a lot of traffic from AS22 and AS10. For traffic engineering purposes, AS1 would like to utilize its link with AS40 for the traffic coming from AS22 and its link with AS20 for the traffic received from AS10. In this case, AS1 cannot simply Bonaventure/De Cnodder/Haas/White [Page 13] draft-bonaventure-bgp-redistribution-01.txt July 2001 prepend its own AS number on the link to AS20 since then the traf- fic from AS10 will be received through AS30. To control the traffic received from AS22, AS1 would insert the following redistribution community to its routes sent to AS20 : - Dist_List=1 - Include/Exclude=1 - NO_EXPORT=0 - Prepend=3 - Value = AS22 Similarly, to control the traffic received from AS10, AS1 would insert the following redistribution community to its routes sent to AS30 : - Dist_List=1 - Include/Exclude=1 - NO_EXPORT=0 - Prepend=1 - Value = AS10 Bonaventure/De Cnodder/Haas/White [Page 14]