Network Working Group Daniel Walton Internet Draft David Cook Expiration Date: November 2002 Alvaro Retana File name: draft-walton-bgp-add-paths-00.txt John Scudder Cisco Systems May 2002 Advertisement of Multiple Paths in BGP draft-walton-bgp-add-paths-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The BGP specification [1,2] defines an "Update-Send Process" to advertise the routes chosen by the Decision Process to other BGP speakers. No provisions are made to facilitate the advertisement of multiple paths to the same destination. In fact, a route with the same NLRI as a previously advertised route implicitly replaces the original advertisement. This document proposes a mechanism that will allow the advertisement of multiple paths for the same prefix without the new paths implicitly replacing any previous ones. The essence of the mechanism is that each path is identified by an arbitrary identifier in addition to its prefix. Walton, et al [Page 1] INTERNET DRAFT Multiple Paths in BGP May 2002 1. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [3]. 2. Advertisement of Multiple Paths in BGP This section describes an extension to the attributes developed for multiprotocol transport [5] that allows the advertisement of multiple paths in BGP. 2.1. Capability Advertisement This specification defines the capability [4] ADD_PATH. This capability MUST NOT be advertised unless multiprotocol support [5] is also advertised. The ADD_PATH capability has code TBD. Its length is zero, there is no data. Capability code 4 defined in [6] MUST NOT be advertised if ADD_PATH is advertised (see also the section below entitled 'Modifications to "Carrying Label Information in BGP-4"'). 2.2. NLRI Encoding If two BGP speakers advertise the ADD_PATH and multiprotocol capabilities to each other, the NLRI encoding is modified to add two new fields at the beginning of the NLRI -- a "bestpath" flag indicating if the NLRI has been selected for installation in the advertiser's FIB, and an identifier to distinguish the NLRI from other NLRI with the same prefix but different path attributes and/or nexthop. We note that in many BGP operations, the prefix is used as a key for identifying a datum, for example when withdrawing a route using the procedures of [1,2] only the prefix needs to be specified in order to withdraw the entire route. For such purposes, the identifier field introduced by this specification is treated as part of the key. The following subsections specify the necessary modifications to existing encodings. We recommend that future documents which specify NLRI encodings for BGP include an encoding (possibly the sole encoding) compatible with this specification. Walton, et al [Page 2] INTERNET DRAFT Multiple Paths in BGP May 2002 2.2.1. Modifications to "Multiprotocol Extensions for BGP-4" "Multiprotocol Extensions for BGP-4" [5], section 7 is replaced by the following: The Network Layer Reachability information is encoded as one or more 4-tuples of the form , whose fields are described below: +---------------------------+ | Bestpath (1 bit) | +---------------------------+ | Identifier (15 bits) | +---------------------------+ | Length (1 octet) | +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Bestpath: If set to one, the bestpath bit indicates that the path associated with the NLRI has been selected by the BGP speaker for installation into its FIB. If set to zero, the path has not been selected. The bestpath bit MUST NOT be used for identifying the path. In other words, it does not form part of the key used to to identify the path. If a route which was advertised with the bestpath bit set to one is removed from the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to zero, or withdrawn. Likewise, if a route which was advertised with the bestpath bit set to zero is selected for installation in the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to one, or withdrawn. b) Identifier: The Identifier field allows the address prefix and its associated path attributes ("path") to be distinguished from other paths for the same prefix. The selection of identifier values is a local implementation decision. c) Length: The Length field indicates the length in bits of the address Walton, et al [Page 3] INTERNET DRAFT Multiple Paths in BGP May 2002 prefix. A length of zero indicates a prefix that matches all (as specified by the address family) addresses (with prefix, itself, of zero octets). d) Prefix: The Prefix field contains an address prefix followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. 2.2.2. Modifications to "Carrying Label Information in BGP-4" "Carrying Label Information in BGP-4" [6] is modified as follows. Section 4 ("Advertising Multiple Routes to a Destination") is deleted, as the procedures of this specification allow multiple routes to be advertised, so no other procedures are required. For the same reason, the final paragraph of Section 5 (which specifies capability code 4) is deleted. Section 3 is replaced by the following: Label mapping information is carried as part of the Network Layer Reachability Information (NLRI) in the Multiprotocol Extensions attributes. The AFI indicates, as usual, the address family of the associated route. The fact that the NLRI contains a label is indicated by using SAFI value 4. The Network Layer Reachability information is encoded as one or more 5-tuples of the form , whose fields are described below: +---------------------------+ | Bestpath (1 bit) | +---------------------------+ | Identifier (15 bits) | +---------------------------+ | Length (1 octet) | +---------------------------+ | Label (3 octets) | +---------------------------+ +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Bestpath: Walton, et al [Page 4] INTERNET DRAFT Multiple Paths in BGP May 2002 If set to one, the bestpath bit indicates that the path associated with the NLRI has been selected by the BGP speaker for installation into its FIB. If set to zero, the path has not been selected. The bestpath bit MUST NOT be used for identifying the path. In other words, it does not form part of the key used to to identify the path. If a route which was advertised with the bestpath bit set to one is removed from the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to zero, or withdrawn. Likewise, if a route which was advertised with the bestpath bit set to zero is selected for installation in the advertiser's FIB, the route MUST be re-advertised with the bestpath bit set to one, or withdrawn. b) Identifier: The Identifier field allows the address prefix and its associated path attributes ("path") to be distinguished from other paths for the same prefix. The selection of identifier values is a local implementation decision. c) Length: The Length field indicates the length in bits of the address prefix plus the label(s). d) Label: The Label field carries one or more labels (that corresponds to the stack of labels [7]). Each label is encoded as 3 octets, where the high-order 20 bits contain the label value, and the low order bit contains "Bottom of Stack" (as defined in [7]). e) Prefix: The Prefix field contains address prefixes followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. The label(s) specified for a particular route (and associated with its address prefix) must be assigned by the LSR which is identified by the value of the Next Hop attribute of the route. When a BGP speaker redistributes a route, the label(s) assigned to that route must not be changed (except by omission), unless the speaker changes the value of the Next Hop attribute of the route. Walton, et al [Page 5] INTERNET DRAFT Multiple Paths in BGP May 2002 A BGP speaker can withdraw a previously advertised route (as well as the binding between this route and a label) by either (a) advertising a new route (and, optionally, a label) with the same NLRI as the previously advertised route (keeping in mind that the identifier comprises part of the NLRI for this purpose), or (b) listing the NLRI (again keeping in mind the inclusion of the identifier as part of the NLRI for this purpose) of the previously advertised route in the Withdrawn Routes field of an Update message. In the latter case, no label information need be included. 2.3. Operation Using the identifier specified in the previous subsection, the same prefix can be advertised multiple times without subsequent advertisements replacing previous ones. Apart from the fact that this is possible, the route advertisement rules of [1,2] are not changed. In particular, a new advertisement of a given NLRI (remembering that the identifier is part of the NLRI's definition) replaces a previous advertisement of the given NLRI. This specification requires the use of multiprotocol encodings [5]. When two BGP speakers have advertised the ADD_PATH and multiprotocol capabilities to each other, IPv4 unicast NLRI MUST be sent using the MP encoding of [5]. IPv4 unicast NLRI MUST NOT be sent using the encoding of [1,2]. Similarly, when two BGP speakers have advertised the ADD_PATH, multiprotocol and MP_CAP [6] capabilities to each other, the encoding of [6] MUST NOT be used, the encoding of this specification MUST be used instead. 3. Deployment Considerations The intent of this extension is to be used in a controlled fashion for applications that require only partial propagation of the routing information, or specific individual recipients. Care should be taken when deploying this enhancement. If deployed improperly, the presence of extra paths in some parts of the AS and not in others can cause inconsistent routing. One scenario of particular concern involves the IGP metric to the address depicted by the NEXT_HOP, and the MED attribute. If this extension is used to advertise alternate paths, the best path [1,2] SHOULD also be advertised. As long as the best path is still selected as best, the presence of additional paths in some parts of the AS and not others will not cause inconsistent routing. However, if the IGP metric to the address depicted by the NEXT_HOP should change such that a non best path is now preferred over the best path, then every router in Walton, et al [Page 6] INTERNET DRAFT Multiple Paths in BGP May 2002 the path to the address depicted by the NEXT_HOP should have the additional paths. Because the MED is only compared between routes from the same AS [1,2], it is possible that an additional path could be selected as the best path. This may cause inconsistent routing if all routers in the forwarding path of the affected routers do not have the additional paths. In a simple topology, it may be possible to anticipate these scenarios and avoid inconsistent routing while still enabling appropriate applications. Documents proposing applications of this extension SHOULD specify restrictions for propagating additional paths and should supply specific deployment guidelines. 4. Security Considerations This document introduces no new security concerns to BGP or other specifications referenced in this document. 5. Acknowledgments We would like to thank Dave Meyer, Srihari Ramachandra, Eric Rosen, Dan Tappan, Robert Raszuk and Mark Turner for their comments and suggestions. 6. References [1] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)," RFC 1771, March 1995. [2] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)," Work in Progress (draft-ietf-idr-bgp4-17.txt), January 2002. [3] Bradner, S., "Key words for use in RFCs to Indicate Require- ment Levels," RFC 2119, March 1997. [4] Chandra, R. and J. Scudder, "Capabilities Advertisement with BGP-4," RFC 2842, May 2000. [5] Bates, T., R. Chandra, D. Katz and Y. Rekhter, "Multiprotocol Extensions for BGP-4," RFC 2858, June 2000. [6] Rekhter, R. and E. Rosen, "Carrying Label Information in BGP-4," RFC 3107, May 2001. Walton, et al [Page 7] INTERNET DRAFT Multiple Paths in BGP May 2002 [7] Rosen, E., D. Tappan, G. Fedorkow, Y. Rekhter, D. Farinacci, T. Li and A. Conta, "MPLS Label Stack Encoding", RFC 3032, January 2001. 7. Authors' Addresses Daniel Walton Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: dwalton@cisco.com Alvaro Retana Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: aretana@cisco.com David Cook Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: dacook@cisco.com John G. Scudder Cisco Systems, Inc. 100 S. Main Suite 200 Ann Arbor, MI 48104 Email: jgs@cisco.com Walton, et al [Page 8]