Network Working Group John G. Scudder Internet Draft Chandra Appanna Expiration Date: May 2004 Cisco Systems File name: draft-scudder-bgp-multisession-00.txt November 2003 Multisession BGP draft-scudder-bgp-multisession-00.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This specification augments "Multiprotocol Extensions for BGP-4" [MP- BGP] by proposing a mechanism to allow multiple sessions to be used between a given pair of BGP speakers. Each session is used to transport routes for one or more AFI/SAFI. This provides an alternative to the current [MP-BGP] approach of multiplexing routes for all AFI/SAFI onto a single connection. Use of this approach is expected to increase the robustness of the BGP protocol as it is used to support more and more diverse AFI/SAFI. Expires May 2004 [Page 1] INTERNET DRAFT Multisession BGP November 2003 1. Introduction Most BGP [BGP, BGP-DRAFT] implementations only permit a single ESTABLISHED connection to exist with each peer. More precisely, they only permit a single ESTABLISHED connection for any given pair of IP endpoints. Multiprotocol BGP [MP-BGP] extends BGP to allow information for multiple NLRI families and sub-families to be transported in BGP. Routes for different families are distinguished by AFI and SAFI. Routes for different families are commonly multiplexed onto a single BGP session. A common criticism of BGP is the fact that most malformed messages cause the session to be terminated. While this behavior is necessary for protocol correctness, one may observe that the protocol machinery of a given implementation may only be defective with respect to a given AFI/SAFI. Thus, it would be desirable to allow the session related to that family to be terminated while leaving other AFI/SAFI unaffected. As BGP is commonly deployed, this is not possible. In this specification, we propose a mechanism by which multiple transport sessions may be established between a pair of peers. Each transport session can be used for one or more AFI/SAFI. Each session is distinct from a BGP protocol point of view; an error or other event on one session has no implications for any other session. All protocol modifications proposed by this specification take place during the OPEN exchange phase of the session, there are no modifications to the operation of the protocol once a session reaches ESTABLISHED state. Routers implementing this specification MUST also implement [MP-BGP]. 2. Definitions "MP-BGP capability" refers to the capability [BGP-CAP] with code 1, specified in [MP-BGP] section 10. A BGP speaker is said to "support" some feature or functionality (for example, to support this specification, or to support a particular AFI/SAFI) when the BGP implementation supports the feature AND the feature has not been disabled by configuration. A pair of AFI/SAFI groups is said to "conflict" when considering the two groups as two sets, there is an intersection between the groups but neither group is a subset of the other. Expires May 2004 [Page 2] INTERNET DRAFT Multisession BGP November 2003 3. Use of BGP Capability Advertisement This specification defines the Multisession capability [BGP-CAP]: Capability code (1 octet): TBD Capability length (1 octet): 1 Capability value (1 octet): Flags as below 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |G| Reserved | +-+-+-+-+-+-+-+-+ The most significant bit is defined as the Grouping Support (G) bit. It can be used to indicate support for the ability to group multiple AFI/SAFI into one session. When set (value 1) this bit indicates that the BGP speaker supports grouping. The remaining bits are reserved, and should be set to zero by the sender and ignored by the receiver. 4. New NOTIFICATION Subcodes [BGP, BGP-DRAFT] Section 4.5 provides a number of subcodes to the NOTIFICATION message, and Section 6.2 elaborates on the use of those subcodes. This specification introduces two new subcodes: OPEN Message Error subcodes: 7 - No Supported AFI/SAFI. 8 - Grouping Conflict 9 - Grouping Required The No Supported AFI/SAFI code MAY be used when an OPEN message contains one or more MP-BGP capabilities, none of which list an AFI/SAFI supported by the local BGP speaker. It is observed that this subcode may be useful for MP-BGP speakers in general, even if they do not (otherwise) implement this specification. The Grouping Conflict code MAY be used when an OPEN message contains several MP-BGP capabilities whose AFI/SAFI conflict with one or more Expires May 2004 [Page 3] INTERNET DRAFT Multisession BGP November 2003 AFI/SAFI groups configured on the local BGP speaker. The Data field SHOULD indicate one of the conflicting locally-configured AFI/SAFI groups, encoded as MP-BGP capabilities. The Grouping Required code MAY be used when a BGP speaker which is configured to require grouping attempts to establish a connection with a BGP speaker which does not support grouping. (While it is true that it might be possible to communicate much the same information using the Unsupported Capability NOTIFICATION message, this more explicit method is felt to be more transparent.) The use of these subcodes is further elaborated below. 5. Overview of Operation Until a BGP speaker has initiated or accepted one connection from a given peer, it is unknown whether the peer supports this specification or not. Two strategies can be considered for making this initial determination -- either the BGP speaker can initially assume that the peer does not support this specification, and switch modes if it is discovered that it does, or vice-versa. Either approach is acceptable. The "Using Multisession" sections below discuss the BGP speaker's behavior when the peer does support this specification or is assumed to. The "Backward Compatibility" section discusses the BGP speaker's behavior when the peer does not support this specification, or is assumed not to. Both sections discuss how to switch to the other mode. A BGP speaker which supports this specification SHOULD always advertise the Multisession capability, regardless of its peer's known or presumed capability set. 5.1. Using Multisession: The following subsections discuss a BGP speaker's behavior towards a peer which is known or assumed to support this specification. Note that if a BGP speaker only wishes to support a single AFI/SAFI in its communications with a given peer only one session is needed in any case, and so the "multisession" feature is moot. In such a case the behavior required would be indistinguishable from that given in the "backward compatibility" section below. In the following sections, it is generally assumed that a BGP speaker does wish to support multiple AFI/SAFI in its communications with a given peer. Expires May 2004 [Page 4] INTERNET DRAFT Multisession BGP November 2003 5.1.1. Initiating Connections: When a BGP speaker attempts BGP communication with its peer, it initiates one connection per group of AFI/SAFI it wishes to support. (This implies that a new local TCP port will be allocated for each new connection.) The OPEN sent on each connection MUST include the Multisession capability and one or more MP-BGP capabilities indicating the AFI/SAFI to be supported on that session. If a non- trivial group of AFI/SAFI (i.e., a group of two or more) is proposed, the BGP speaker MUST also set the G bit of the Multisession capability. Even if a trivial group of AFI/SAFI is proposed, the G bit SHOULD be set if grouping is supported. Note that any "group of AFI/SAFI" may be a singleton group, i.e. the speaker may wish to use a separate BGP connection for each AFI/SAFI. If the peer also supports this specification and also wishes to support the AFI/SAFI in question, it will respond with an OPEN which includes the Multisession capability and the AFI/SAFI included in the active speaker's OPEN. If the active speaker's OPEN included a non- trivial group of AFI/SAFI which the peer supports, then the peer's Multisession capability will have the G bit set. If the peer also supports this specification and wishes to support some but not all of the AFI/SAFI in question, it will respond with an OPEN which includes the Multisession capability and a subset of AFI/SAFI included in the active speaker's OPEN. The reason for listing only a subset may be because some of the AFI/SAFI are simply not supported, or because the peer does not wish to support the AFI/SAFI as a group (i.e. it may be configured to use a smaller group). In this case, the BGP speaker MAY consider the set of AFI/SAFI which were not included in the peer's OPEN to form a new group, and MAY try to initiate a new session using that group. If the peer also supports this specification but does not support grouping, and a non-trivial group of AFI/SAFI has been proposed, then it will respond as given in the previous paragraph but with the additional proviso that the G bit will be clear. In this case, the BGP speaker MAY accept the connection as given in the previous paragraph, or it MAY reply with a NOTIFICATION message with ERROR Code OPEN Message Error and Error Subcode Grouping Required, and the connection will be closed. If the peer does not wish to support the AFI/SAFI in question, it will reply with a NOTIFICATION message with Error Code OPEN Message Error, and Error Subcode No Supported AFI/SAFI, and the connection will be closed. Expires May 2004 [Page 5] INTERNET DRAFT Multisession BGP November 2003 A BGP speaker SHOULD NOT attempt to initiate connections for any AFI/SAFI for which a connection already exists. If the peer does not support this specification, it will respond with an OPEN which does not include the Multisession capability. In this case the connection SHOULD be terminated, and future connections to the peer should be attempted in the "backward compatibility" mode discussed below. 5.1.2. Accepting Connections: When processing a connection attempt, the BGP speaker MUST wait until the peer's OPEN message has been received before proceeding. This is at variance with the behavior specified in the finite state machine (FSM) of [BGP-DRAFT], but is interoperable with that FSM. The FSM changes are specified in a later section. Once the peer's OPEN message has been received, if it includes the Multisession capability and one or more MP-BGP capabilities indicating a group of AFI/SAFI which the BGP speaker wishes to support, then the BGP speaker responds with an OPEN message which includes the Multisession capability and one or more MP-BGP capabilities indicating the same AFI/SAFI. If the OPEN includes the Multisession capability and one or more MP- BGP capabilities indicating a group of AFI/SAFI which conflicts with an AFI/SAFI grouping that has been configured on the BGP speaker then the BGP speaker MAY reply with an OPEN listing a set of AFI/SAFI which intersect with those proposed by the peer (in effect overriding the locally configured set) or it MAY close the connection with a NOTIFICATION message with Error Code OPEN Message Error and Error Subcode Grouping Conflict. The former behavior is suggested as the default if grouping is supported. If the BGP speaker does not support AFI/SAFI grouping it MAY reply with an OPEN listing one of the AFI/SAFI out of those proposed by the peer. It SHOULD also set the G bit in the Multisession capability to zero. If the received OPEN message does not include any MP-BGP capability indicating an AFI/SAFI the BGP speaker wishes to support, it should close the connection with a NOTIFICATION message with Error Code OPEN Message Error and Error Subcode No Supported AFI/SAFI. If the received OPEN message does not include the Multisession capability, then the peer does not support this specification. The connection MAY be continued in the "backward compatibility" mode Expires May 2004 [Page 6] INTERNET DRAFT Multisession BGP November 2003 discussed below, or it MAY be terminated and future connections to the peer attempted in the "backward compatibility" mode. 5.1.3. Collision Detection, Graceful Restart: [BGP, BGP-DRAFT] Section 6.8 (BGP connection collision detection) considers a pair of connections to have collided if the source and destination IP addresses of both connections match. With respect to peers which support this specification, the AFI/SAFI groups associated with the connections must also intersect for them to be considered to have collided. This consideration also applies to Section 6.2 of [BGP-GR], when determining whether a new connection should be considered equivalent to a reset of a previous TCP session. 5.2. Backward Compatibility: This subsection discusses a BGP speaker's behavior towards a peer which is known or assumed not to support this specification. In short, the BGP speaker's behavior towards such a peer should be as otherwise defined for the BGP protocol, according to [BGP, BGP-DRAFT] and any other extension supported by the BGP speaker. As previously mentioned, the BGP speaker SHOULD always advertise the Multisession capability in its OPEN message, even towards "backward compatibility" peers. If, in opening a BGP connection with such a peer, an OPEN which includes the Multisession capability is received from the peer, then the peer SHOULD be changed to "multisession" mode. How this is done depends on whether the BGP speaker has already sent an OPEN or not -- If the BGP speaker has not yet sent an OPEN to the peer, then the connection MAY be continued in the "multisession" mode discussed above, or it MAY be terminated and future connections to the peer attempted in "multisession" mode. If the BGP speaker has sent an OPEN to the peer, then the current session SHOULD be terminated and future connections to the peer attempted in "multisession" mode. Use of techniques such as [BGP-DYN-CAP] for on-the-fly switching of session modes are beyond the scope of this document. Expires May 2004 [Page 7] INTERNET DRAFT Multisession BGP November 2003 6. State Machine As mentioned under "accepting connections" above, this specification modifies the BGP finite state machine, albeit in a backward- compatible fashion. In addition, note that one state machine is considered to exist for each of the connections which may exist to a given peer. This implies that, for example, any session flap dampening that may exist is performed per AFI/SAFI. The specific state machine modifications to [BGP-DRAFT] Section 8.2.2 are as follows. 6.1. Modifications to Connect State and Active State In the actions in response to the events Open Delay timer expires [Event 12] and TCP connection succeeds [Event 16 or Event 17], an OPEN is not sent and the state changes to WaitForOpen and not to OpenSent. 6.2. Addition of WaitForOpen State, Deletion of OpenSent State The WaitForOpen state is the same in all respects to OpenSent, except for the action in response to reception of a valid OPEN message [Event 19]. In that event, the local system sends an OPEN message prior to sending a KEEPALIVE message. The OpenSent state is deleted. All references to OpenSent are replaced by references to WaitForOpen. 7. Discussion Note that many BGP implementations already permit multiple sessions to be used between a given pair of routers, typically by configuring multiple IP addresses on each router and configuring each session to be bound to a different IP address. The principal contribution of this specification is to allow multiple sessions to be created automatically, without additional configuration overhead or address consumption. In addition to the simple mode of supporting one AFI/SAFI per connection, the procedures described here also permit arbitrary grouping of AFI/SAFI onto BGP connections. For such grouping to function pleasingly, both peers participating in a connection need to Expires May 2004 [Page 8] INTERNET DRAFT Multisession BGP November 2003 agree on what AFI/SAFI groupings will be used. If conflicting groupings are configured, the connections may not establish, or more connections may be established than were expected (in the degenerate case, one connection per AFI/SAFI could be established despite configured groupings). We observe that the potential for misbehavior in the presence of conflicting configuration is not unusual in BGP, and that support for, and configuration of grouping is purely optional. 8. Acknowledgements To be supplied. 9. References [BGP4] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)," RFC 1771, March 1995. [BGP-DRAFT] Rekhter, Y., T. Li and S. Hares, "A Border Gateway Protocol 4 (BGP-4)," Work in Progress (draft-ietf-idr-bgp4-20), April 2003. [MP-BGP] Bates, T., R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol Exten- sions for BGP-4," Work in Progress (draft-ietf-idr-rfc2858bis-03), July 2003. [BGP-GR] Sangli, S., Y. Rekhter, R. Fernando, J. Scudder, E. Chen, "Graceful Restart Mechanism for BGP," Work in Progress (draft-ietf-idr- restart-06), January 2003. [BGP-CAP] Chandra, R., J. Scudder, "Capabilities Advertisement with BGP-4," RFC 2842, May 2000. [BGP-DYN-CAP] Chen, E. and S. Sangli, "Dynamic Capability for BGP-4," Work in Progress (draft-ietf-idr-dynamic-cap-03), December 2002. Expires May 2004 [Page 9] INTERNET DRAFT Multisession BGP November 2003 10. Security Considerations This document introduces no new security vulnerabilities to BGP or other specifications referenced in this document. 11. IANA Considerations TBD 12. Authors' Addresses John G. Scudder Cisco Systems, Inc. 100 S. Main Suite 200 Ann Arbor, MI 48104 Email: jgs@cisco.com Chandra Appanna Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134 e-mail: achandra@cisco.com 13. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this doc- ument itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of develop- ing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING Expires May 2004 [Page 10] INTERNET DRAFT Multisession BGP November 2003 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Expires May 2004 [Page 11]