Network Working Group (Editor)Srikanth Chavali INTERNET DRAFT Vasile Radoaca Expiration Date: October 2004 Nortel Networks, Inc. Mo Miri BellSouth Luyuan Fang AT&T (Editor)Susan Hares NextHop Technologies April 2004 Peer Prefix Limits Exchange in BGP draft-chavali-bgp-prefixlimit-02.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document proposes a mechanism to allow BGP peers to coordinate the setting of a limit on the number of prefixes which one BGP speaker will send to its peer. Coordination can prevent disruption of the peering session or discarding of routes, which can occur when a maximum prefix limit is configured on the "receiving" peer, and the Srikanth Chavali et.al. Expires October 2004 [Page 1] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 "sending" peer exceeds the limit. 1. Terms The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. In this document we use the term "BGP sender" to refer to a BGP speaker which is advertising prefixes to its peer. We use the term "BGP receiver" to refer to a BGP speaker which is receiving prefixes from its peer. Although it is clear that in reality each peer is usually both a "BGP sender" and a "BGP receiver", we emphasize a unidirectional relationship in this document for clarity. 2. Introduction There are many scenarios where BGP [BGP-4] peering may be established between two speakers in which there is an expectation that some limited number of prefixes will be announced by a given speaker. Several implementations of BGP offer a configuration option that allows a BGP receiver to provision a limit to the number of prefixes it will accept from a specific peer. When the limit is exceeded, then there are generally two options: the prefixes exceeding the limit can be dropped by the BGP receiver, or the peering session may be terminated by the BGP receiver and restarted at a later time. Neither of these options is desirable. Dropping prefixes leads to network unreliability. Terminating the BGP session is probably worse, since all traffic between the peers will typically be disrupted, even for those prefixes which were advertised before the limit was reached. These effects may be due to network changes, misconfigurations, miscommunications, or other factors where the number of prefixes advertised from a BGP sender to the receiver exceeds the expected number, and the configurations must be revised. Some of the effects are described in detail in [BGP-STUDY]. The basic functionality proposed here is for the BGP speakers to exchange: "warning", "stop receiving" and "disconnect" based limits. Of these "stop receiving" limit parameter is required for this functionality while the rest are OPTIONAL. These limits are exchanged during the initial exchange as "open capabilities", and via the dynamic capability exchange during the bgp connection. 3. Definition of Prefix based limits Prefix limits are encoded as optional capability parameter [BGP-CAP] in the BGP OPEN message [BGP-4] by each BGP speaker as shown below: Srikanth Chavali et.al. Expires October 2004 [Page 2] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |type-code | length | AFI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAFI | Must be Zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLVs | . . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Mandatory TLV (Type lenght Values): 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |sub code 2 | length |limit indicator| Must be zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | maximum prefix limit (Stop Receiving) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Optional TLVs: 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |sub code 1 | length |limit indicator| Must be zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | warning prefix limit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |sub code 3 | length |limit indicator| Must be zero | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reset prefix limit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Meaning for each of the bitwise indicated capability fields above is as follows: Type-Code (1 octet): code identifying this capability (TBD) Srikanth Chavali et.al. Expires October 2004 [Page 3] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 Length (1 octet): Length of the capability value fields Address Family Identifier AFI (2 octets): This along with the Subsequent Address Family Indentifier field identifies the Network Layer Protocol associated with the Network Address. Subsequent Address Family Identifier SAFI (1 octet): This along with the Address Family Identifier field identifies the Network Layer Protocol associated with the Network Address. sub code 1 (1 octet): This OPTIONAL subcode is used to identify the number of routes sent before raising warning. This is done by the BGP speaker that detects it. Length (1 octet): Length of the subcode. It has the same semantics attached to it in all the subcodes. Limit Indicator (1 octet): This octet can be assigned a value of 0 or 1. A value of zero means that the sender SHOULD NOT raise any warning. A value of 1 means the warning indication is necessary and SHOULD be used by the sender when its route advertisement equals the number of sent routes. It has the same meaning in all the subcodes. However, in subcode 2 it can take a value of 1 only. The warning mechanisms are described in the operation section of this draft. warning prefix limit (4 octet): Number of routes sent by the BGP sender. The value for this field is dependent on the maximum prefix limit and SHOULD be always less than it. sub code 2 (1 octet): This mandatory sub code is used to identify the number of routes sent before the sender BGP speaker needs to stop advertising routes to its receiving BGP speaker. Srikanth Chavali et.al. Expires October 2004 [Page 4] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 maximum prefix limit (4 octet): Number of routes sent by the sender BGP speaker. When this limit is hit by the advertising BGP speaker it stops the route advertisement. sub code 3 (1 octet): This OPTIONAL sub code is used to identify the number of routes received after which the BGP speaker will reset the peering session. It MUST be noted here that this situation will never be encountered if adhered to the draft. In other words this happens only during error conditions. The error conditions are beyond the scope of this document. reset prefix limit (4 octet): Number of routes sent by the sender BGP speaker. The value for this field is dependent on the maximum prefix limit and SHOULD be always greater than it. We refer to the warning prefix limit, maximum prefix limit and the reset prefix limit as prefix limits in this document for the ease of illustration. 4. Operation 4.1 Exchanging the configured prefix limits BGP speakers exchange the prefix limits as an optional capability parameter [BGP-CAP] as described in section 4. +--------+ +--------+ | A | <-----------------> | B | +--------+ +--------+ Figure 1 In figure 1 both BGP speakers A and B exchange the prefix limits (defined in section 4) to indicate the support for this capability. Each of A and B set these limits along with the actions associated with each of them in the capability message before exchanging them. The warning prefix limit and reset limit values are determined based on the configured maximum prefix limit. They are typically a percentage value of the maximum prefix limit. The exact percentage values are beyond the scope of this document. The maximum prefix limit configured on A for the peer B implies the maximum number of prefixes that A expects to receive from B. B informs this in the new capability described in section 4. The same interpretation applies to Srikanth Chavali et.al. Expires October 2004 [Page 5] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 B too. 4.2 Route processing after prefix limits exchange In figure 1 both A and B maintain a count of the routes that they receive from each other. Route processing operation is illustrated using the case where B sends route advertisements to A. The same operational procedures apply for the other case of A sending route advertisements to B. B as shown in figure 1 applies the out bound route policies on the Adjacent-Rib-Out followed by the condition of the prefix limits before route advertisements. 4.2.1 Processing When Warning Limit Encountered +--------+ +--------+ | A | <-----------------> | B | +--------+ +--------+ B detects warning prefix limit <------ generates dynamic capability message to A Figure 2 In figure 2 it can be seen in due course of route advertisements to A, B generates a dynamic capability [BGP-DYN-CAP] destined to A (if the warning limit indicator is turned on). This message comprises of the capability received from A, when the warning prefix limit is hit. This serves as a warning indicator. Either A or B or both of them could generate this message depending on timing of warning limit detection. B and A MAY choose to raise internal warning when this condition is detected. Following the warnings both A and B continue advertising routes normally to each other. 4.2.2 Processing When Stop Limit Encountered +--------+ +--------+ | A | <-----------------> | B | +--------+ +--------+ B detects maximum prefix limit <------ generates dynamic capability message to A and stop route advertisement to A Figure 3 In figure 3, B during route advertisement detects that the maximum Srikanth Chavali et.al. Expires October 2004 [Page 6] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 prefix limit for route advertisement is reached. It SHOULD stop further route advertisements to A. B will toss any route received with a new prefix once the stop limit has been has been hit. B then SHOULD send a Dynamic Capability [BGP-DYN-CAP] to A indicating the current capability if the limit indicator is set. As in the case of warning prefix limit condition either A or B or both could send dynamic capability [BGP-DYN-CAP]. Any route withdrawal to A is automatically recorded and SHOULD result in restoring the announce policy to the configured one (if any configured) implicitly. 4.3 Prefix limit changes If a need for prefix limits change arises, each BGP speaker B whose configuration changes for its peer A, SHOULD dynamically [BGP-DYN- CAP] inform the corresponding peer of this change. Such changes SHOULD be handled as described in the following sub-sections. 4.3.1 Processing when maximum prefix limit is increased When the prefix limits are increased in the configuration of A in figure 1, it SHOULD inform B about it as described in 4.3. B SHOULD then restart the route advertisements and it MAY either choose to do so from the Adjacent-Rib-Out for A incrementally or make use of Route Refresh mechanism [BGP-RREFRESH]. In doing so the restart of BGP peering and the associated network traffic and service disruption with it, is avoided. If the maximum prefix limit is not reached and increased prefix limits are received by the peer B, then peer B SHOULD note this and continue with its advertisements to A until these limits are reached. 4.3.2 Processing when the maximum prefix limit is decreased When the prefix limits are decreased in the configuration of A (refer figure 1), then B SHOULD be informed about it as described in 4.3. B then SHOULD note this information and SHOULD stop route advertisement immediately if the number of route adtverisments exceeds this new maximum prefix limit for A. By doing so B can avoid processing the routes which will be discarded by A when it detects the maximum prefix limit condition. B at that point follows the process described in 4.2 for route processing. 5. Error Handling New error codes along with the sub-codes are defined (TBD). If a BGP peer does not support this capability and receives it, then the peer sends a NOTIFICATION with the appropriate error code and the sub-code. The BGP speaker then SHOULD re-initiate the peering session Srikanth Chavali et.al. Expires October 2004 [Page 7] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 without the unsupported capability. 5.1 Open Message responded to with Notification OPEN messages can be rejected for the listed unsupported capabilities by the BGP speakers. The error code for an open message negotiation of Capabilities is sub-code 7 [BGP-CAP]. The maximum prefix TLV will be included in the list of capabilities. 5.2 Capability Message responded to with a Notification Errors For errors in Dynamic Capabilities, a NOTIFICATION message may be sent with the Capability messages error code (7) [BGP-DYNCAP] set. Current sub-code for this error message are: Subcode Symbolic Name 1 Invalid Action Value 2 Invalid Capability Length 3 Malformed Capability Value 4 Unsupported Capability Code Support for the Maximum Prefix value negotations will require the addition of the following sub-code 5 Invalid Capability Value If the Maximum Prefix code is not supported, the NOTIFICATION message will be returned with a error code of 7 with a sub-code of 4 (unsupported Capability Code). If the Maximum Prefix Capability is supported, but the value is not-acceptable to receiving node, the Notification can be sent with the 5 invalid capability value and the data field set to the Maximum Prefix TLVs that are not acceptable. 5.3 Cease message for peering reset When the reset maximum prefix value is exceeded, the peering session SHOULD be dropped. In which case the CEASE code in the NOTIFICATION message will be used. The [CEASECODE] proposed BGP Draft gives a subcode of 1 for a Maximum prefix exceed. The data field has a maximum prefix upper bound. This field should have a optional 1 octet field that allows a maximum prefix sub-codes to be encoded beyond this field. 6. Security Considerations This document does not change the underlying security issues in the BGP protocol. It however, does provide an additional mechanism to Srikanth Chavali et.al. Expires October 2004 [Page 8] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 protect against Denial of service attacks based on exceeding configured maximum prefix limits. 7. References [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- 4)", draft-ietf-idr-bgp4-20.txt. Work in progress. [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with BGP-4", RFC 3392, May 2000. [BGP-RREFRESH] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, September 2000. [BGP-DYN-CAP] Chen, E., Sangli, S. R., "Dynamic Capability for BGP- 4", draft-ietf-idr-dynamic-cap-03.txt. Work in progress. [BGP-STUDY] Chang, D., Govindan, R., Heidemann, J., "An Empirical Study of Router Response to Large BGP Routing Table Load", ACM SIGCOMM Internet Measurement Workshop, pp. 203-208, Marseille, France, November 2002. [CEASECODE] Chen, E., "Subcodes for BGP Cease Notification Message", draft-ietf-idr-cease-subcode-05.txt. Work in progress. 8. IANA Considerations This document uses a new capability type for the support of prefix limits and the corresponding NOTIFICATION code along with the sub- codes for non-support. This must be assigned by IANA. 9. Acknowledgements The authors would like to thank George Matey, Marten Terpstra, Yakov Rekhter, Enke Chen, Rob Thomas, Manish Gupta, Dan Joyal, Rajesh Saluja and Elwyn Davies for their review and comments. 10. Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this Srikanth Chavali et.al. Expires October 2004 [Page 9] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 11. Author's Addresses: Srikanth Chavali Vasile Radoaca Paul Knight Nortel Networks 600 Technology Park Drive Billerica, MA 01821 USA Email: schavali@nortelnetworks.com vasile@nortelnetworks.com paul.knight@nortelnetworks.com Mo Miri BellSouth 575 Morosgo Drive 4A62 Atlanta, GA 3032 home: +1 404-499-5526 email: mohammad.miri@bellsouth.com Luyuan Fang ATT Labs 200 Laurel Avenue, Room C2-3B35, Middletown, NJ 07748 Phone: +1 732 420 1921 Email: luyuanfang@att.com Srikanth Chavali et.al. Expires October 2004 [Page 10] Internet Draft draft-chavali-bgp-prefixlimits-02.txt September 2003 Susan Hares NextHop Technologies 825 Victors Way Suite 100 Ann Arbor, MI 48108 Phone: +1 734 222 1610 Email: skh@nexthop.com Srikanth Chavali et.al. Expires October 2004 [Page 11]