Network Working Group Robert Raszuk Internet Draft Keyur Patel Expiration Date April 2006 Chandra Appanna David Ward Cisco Systems, Inc October 2005 BGP Aggregate Withdraw draft-raszuk-aggr-withdraw-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright (C) The Internet Society (2005). Abstract This document proposes a scheme that allows a BGP speaker to withdraw multiple NLRIs that share a set of properties more efficiently by just specifying the shared properties among them. 1. Introduction This document proposes a scheme that allows a BGP speaker to withdraw multiple NLRIs that share a set of properties more efficiently by just specifying the shared properties among them. One area where this kind of feature is particularly important is 2547. The growth and success of 2547 VPN deployments forces operators and vendors to seek much more efficient and scalable mechanisms for vpn prefix management in VPN networks. This draft introduces new BGP attribute called MP_AGGREGATE_WITHDRAW attribute which allows BGP to withdraw multiple NLRIs in a single message thereby reducing significantly the load on routers, number of BGP update messages and convergence time. MP_AGGREGATE_WITHDRAW can also be used to implement Graceful Shutdown functionality to allow rerouting of traffic before the BGP session is down. This mechanism is applicable to and works for any BGP AFI/SAFI. 2. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. MP_AGGREGATE_WITHDRAW Attribute (Type Code TBD by IANA) This is an optional non-transitive attribute that can be used for the purpose of aggregating multiple unfeasible NLRIs to be removed from service. The attribute is encoded as shown below: +---------------------------------------------------------+ | Address Family Identifier (2 octets) | +---------------------------------------------------------+ | Subsequent Address Family Identifier (1 octet) | +---------------------------------------------------------+ | Flags (2 octets) | +---------------------------------------------------------+ | Total Attribute Length (2 octets) | +---------------------------------------------------------+ | Attributes (variable length) | +---------------------------------------------------------+ | TLVs (optional & variable length) | +---------------------------------------------------------+ The use and the meaning of these fields are as follows: Address Family Identifier: This field carries the identity of the Network Layer protocol associated with the NLRI that follows. Presently defined values for this field are specified in RFC 1700 (see the Address Family Numbers section). Subsequent Address Family Identifier: This field provides additional information about the type of the Network Layer Reachability Information carried in the attribute. Flags: This 2-octet unsigned integer indicates Flags value for the the MP_AGGREGATE_WITHDRAW. The flags are defined as: 0x01 Withdraw paths that match all attributes 0x02 Withdraw paths that match any one or more attributes 0x04 Set to one only when TLVs are present Total Attribute Length: This 2-octet unsigned integer indicates the total length of the Path Attributes field in octets. Its value allows the length of the Network Layer Reachability field to be determined as specified below. A value of 0 indicates that neither the Network Layer Reacha- bility Information field, nor the Path Attribute field is present in this UPDATE message. Attributes: For format description refer to [BGP4]. TLVs: In the case where there is a need to send other information then those carried in BGP attributes to uniquely identify the NLRIs to be withdrawn we define a TLV field. The following TLV format has been defined: Type One octet field set to value of given TLV. Length One octet field that indicates the length of the value portion in octets. Reserved One octet field reserved for future flags Value Description of the value carried in given TLV An UPDATE message that contains the MP_AGGREGATE_WITHDRAW is not required to carry any other path attributes. Only one or zero of TLV value per MP_AGGREGATE_WITHDRAW attribute should be present. If the TLV value is present alone (no attributes) the match should happen on this value alone. 4. TLV definitions 4.1 Route Distinguisher In the 2547 VPNs [RFC2547] in the MP_AGGREGATE_WITHDRAW there is a need for unique identification of VPN routes to which attached attributes belong to. This is accomplished by distributing route distinguisher in the following tlv encoding: Type: One octet field set to value of 1 Length: One octet field set to value of eight Reserved: One octet field reserved (all zeros) Value: Eight octet RD value 4.2 TIME_TO_WITHDRAW This time represents a TIME_TO_WITHDRAW. It is has a value field length of 2 octet. This type represents the time after which the forwarding support will be withdrawn for all reachability associated with the MP_AGGREGATE_WITHDRAW and is a value in seconds. Type: One octet field set to value of 1 Length: One octet field set to a value of 2 Reserved: One octet field reserved (all zeros) Value: 2 octet value representing number of seconds 5. MP_AGGREGATE_WITHDRAW Capability The MP_AGGREGATE_WITHDRAW Capability is a new BGP capability [BGP-CAP] that can be used by a BGP speaker to indicate its ability to receive and send aggregated withdraws. This capability is defined as follows: Capability code: TBD by IANA Capability length: variable Capability value: Consists of the one or more of the tuples as follows: +--------------------------------------------------+ | Address Family Identifier (16 bits) | +--------------------------------------------------+ | Subsequent Address Family Identifier (8 bits) | +--------------------------------------------------+ | ... | +--------------------------------------------------+ | Address Family Identifier (16 bits) | +--------------------------------------------------+ | Subsequent Address Family Identifier (8 bits) | +--------------------------------------------------+ Address Family Identifier (AFI): This field carries the identity of the Network Layer protocol for which the Graceful Restart support is advertised. Presently defined values for this field are specified in [RFC1700]. Subsequent Address Family Identifier (SAFI): This field provides additional information about the type of the Network Layer Reachability Information carried in the attribute. Presently defined values for this field are specified in [RFC1700]. 6. Aggregate Withdraw Extended Community Attribute Aggregate Withdraw Extended Community is a mandatory non-transitive extended community that can be used for the purpose of uniformed marking closed NLRI groups with common fate sharing. The mandatory requirement comes from a fact that an implementation which supports MP_AGGREGATE_WITHDRAW must also support Aggregate Withdraw Extended Community. Aggregate Withdraw Extended Community attribute is carried in BGP Extended Community Attribute of type code 16. The Aggregate Withdraw Extended Community attribute is encoded as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type high | Type low | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value (cont.) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The value of the high-order octet of the type field for the Marker Community can be 0x43. That indicated first come first served IANA type of assignment, non-transitive, opaque extended community The value of the low-order octet of the type field for this community is .... (TBD). The value is a locally significant 6 octet value assigned by bgp speaker to differentiate the routes based on various operator's depended requirements. It's allocation can be fully algorithmic and automatic or it could be assigned some meaningful structure. Being a locally significant it can be overwritten by any BGP speaker. 7. Operation A BGP speaker receiving an update message with MP_AGGREGATE_WITHDRAW does not support MP_AGGREGATE_WITHDRAW capability, it simply ignores the message and logs the warning. The BGP speaker implementing MP_AGGREGATE_WITHDRAW capability and receiving an update message with MP_AGGREGATE_WITHDRAW should remove all the NLRIs (paths) that match the attribute and TLV list specified in the MP_AGGREGATE_WITHDRAW attribute for each AFI/SAFI. The matching of the attributes is further qualified by the operation type specified in the flags field associated with the AFi/SAFI and can be logical AND or OR. The additional TLV value presence is indicated by by the flags field. It's value will always be a logical AND to all other attributes if present. If the the TIME_TO_WITHDRAW is sent in the MP_AGGREGATE_WITHDRAW, it must be interpreted by the receiveing BGP speaker as the minimum duration for which the sending BGP speaker will preserve forwarding of reachability already announced prior to receiving this MP_AGGREGATE_WITHDRAW. The purpose of TIME_TO_WITHDRAW is to allow the implentation of Graceful Shutdown funtionality whereby the receiving BGP speaker is provided a some time to reconverge before the sending BGP speaker is no longer available for forwarding traffic. In the event of an AFI/SAFI being in the MP_AGGREGATE_WITHDRAW attribute that is not supported as per the initial capability negotiation, a BGP Notification message with the notification code set to UNSUPPORTED_AFI_SAFI should be sent and the session should be terminated. 8. Deployment Considerations 8.1 Sessions to all CEs in a vrf goes down or is being shutdown. Today: All vrf routes are send within MP_UNREACH New: A single message with RD lists all export RTs which were under given vrf is being send. 8.2 Sessions to one CE in a vrf goes down or is being shutdown. Today: All routes from a given CE are send within MP_UNREACH New: A single message with marker extended community and optionally an RD under given vrf is being send. 8.3 A subset of routes or all routes of given AFI/SAFI marked with a unique community or an attribute Today: It would require to send all route in an MP_UNREACH attribute New: Just one msg with MP_AGGREGATE_WITHDRAW listing this unique attribute would be sufficient 8.4 A BGP next hop on NLRIs with a single path goes down Today: It would require to send all routes in an MP_UNREACH attribute New: Just one msg with MP_AGGREGATE_WITHDRAW listing this next hop will be sufficient. 9. Security Considerations This extension to BGP does not change the underlying security issues inherent in the existing BGP [RFC2385]. 10. Acknowledgments The authors would like to thank Dan Tappan and Shyam Suri for their suggestions and feedback. 11.IANA Considerations This document defines new BGP MP_AGGREGATE_WITHDRAW attribute. New attribute code should be introduced using the Standards Action process defined in [RFC-2434]. 12. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [RFC2434] Narten, T., Alvestrand, H., "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 2434, October 1998. 13. Informative References [RFC1771] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [IDR-BGP4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", Work in Progress (draft-ietf-idr-bgp4-21.txt), April 2003. [RFC2858] Bates, T., Rekhter, Y., Chandra, R., Katz, D., "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000 [RFC2547] Rosen, E., Rekhter Y., "BGP/MPLS VPNs", RFC2547, March 1999 [RFC2434] Narten, T., and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 2434/BCP 0026, October, 1998. [RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 Signature Option", RFC 2385, August, 1998. [RFC1700] Reynolds, J., and Postel, J., "Assigned Numbers", STD 2, RFC 1700, October 1994. See also: http://www.iana.org/numbers.html. 14. Authors' Addresses Robert Raszuk Cisco Systems, Inc. 170 West Tasman Dr San Jose, CA 95134 raszuk@cisco.com Keyur Patel Cisco Systems, Inc. 170 West Tasman Dr San Jose, CA 95134 keyupate@cisco.com Chandra Appanna Cisco Systems, Inc. 170 West Tasman Dr San Jose, CA 95134 achandra@cisco.com David Ward Cisco Systems, Inc. 170 West Tasman Dr San Jose, CA 95134 wardd@cisco.com 15. IPR Notices The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. 16. Terms of Use Cisco has a pending patent which relates to the subject matter of this Internet Draft. If a standard relating to this subject matter is adopted by IETF and any claims of any issued Cisco patents are necessary for practicing this standard, any party will be able to obtain a license from Cisco to use any such patent claims under openly specified, reasonable, non-discriminatory terms to implement and fully comply with the standard. 17. Full Copyright Notice Copyright (C) The Internet Society (2005). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.