IDR Working Group I. Varlashkin Internet-Draft Easynet Global Services Updates: 4271, 4760 April 9, 2010 (if approved) Intended status: Standards Track Expires: October 11, 2010 Multisession BGP extensions without new TCP ports draft-varlashkin-idr-multisession-same-port-00 Abstract This document proposes minor backward-compatible changes to BGP for enabling multiple sessions to be established between pair of BGP speakers for different AFI/SAFI or groups of thereof. It also describes mechanism for handling each session in separate process (for information only). This memo updates BGP and multiprotocol extensions for BGP-4 specifications . Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 11, 2010. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. Varlashkin Expires October 11, 2010 [Page 1] Internet-Draft multisession-same-ports April 2010 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Interoperability with legacy implementations . . . . . . . . . 3 4. Conformance requirement . . . . . . . . . . . . . . . . . . . 4 5. Overview of operations . . . . . . . . . . . . . . . . . . . . 4 6. OPEN message handling . . . . . . . . . . . . . . . . . . . . 5 7. BGP connection collision handling modification . . . . . . . . 5 8. AFI/SAFI grouping . . . . . . . . . . . . . . . . . . . . . . 7 9. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 7 10. Security considerations . . . . . . . . . . . . . . . . . . . 8 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 13.1. Normative References . . . . . . . . . . . . . . . . . . 8 13.2. Informative References . . . . . . . . . . . . . . . . . 9 Appendix A. Example of session handover . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Varlashkin Expires October 11, 2010 [Page 2] Internet-Draft multisession-same-ports April 2010 1. Introduction There is desire from both network operators and vendors to separate exchange of information for different AFI/SAFI into multiple sessions between same pair of BGP speakers, particularly to allow various AFI/ SAFI or groups of thereof to be handled in different system processes. Two methods have been already proposed to achieve desired functionality as described in [I-D.ietf-idr-bgp-multisession] and [I-D.raszuk-ti-bgp]. Both of these methods rely on introduction of new TCP ports in addition to existing well-known BGP port, which is neither desirable from operational prospective nor necessary from implementation prospective (at least no sufficient evidence has been made available to justify the need). This memo describes alternative approach to achieve desired functionality without requiring additional TCP ports, and solicits discussion from the community in respect to proposed changes. 2. Terminology In addition to commonly used keywords this memo uses following terminology: 1. Legacy implementation or legacy BGP speaker/peer refers to implementation and speaker respectively that implement original BGP and multiprotocol extensions specifications, but do not implement changes specified in this memo. 2. New implementation or new BGP speaker refers to implementation/ speaker that implements changes specified in this memo and not running in compatibility mode. 3. Compatibility mode refers to a BGP implementation/speaker that behaves as it would be legacy implementation, i.e. it is able to support changes specified in this memo but does not use them because it either has been administratively configured to do so or has detected (through mechanism described later in this memo) that it cannot use new features for particular legacy peer. 4. If not explicitly mentioned otherwise, BGP speaker refers to new BGP speaker from this point on. 3. Interoperability with legacy implementations Mechanisms described in this memo rely on backward-compatible modification of Multiprotocol Extensions capability and backward- Varlashkin Expires October 11, 2010 [Page 3] Internet-Draft multisession-same-ports April 2010 compatible modification of BGP connection collision detection. When BGP speaker detects (through information in the OPEN message) legacy peer it simply behaves as legacy BGP speaker towards that peer. 4. Conformance requirement Implementations conforming to this specification MUST NOT enable changes described here by default. They MUST provide configuration option to explicitly enable new functionality; the option SHOULD be available on per-peer basis. Conforming implementations MUST implement both new format of multiprotocol extensions capability code and the new procedure for connection collision detection. 5. Overview of operations BGP speaker initiates session using modified multiprotocol extensions capability code in OPEN message. If the session is the only session between particular pair of BGP speakers (whether new or legacy) both sides continue according to legacy specifications. If new session comes up when there is already existing session then BGP speaker attempts to detect whether peer is new or legacy peer (by looking at OPEN message from the peer). If peer is a new implementation then BGP speaker continues behaving as new implementation, otherwise it reverts to legacy behaviour. After connection collision detection has been performed and BGP session established, BGP speaker proceeds exchanging BGP messages with the peer similar to legacy implementation. Once session is established BGP speaker is free to handle it within the same process as the one that accepted connection, transfer it to another already running process or create a new process and transfer session to it. Exact mechanism of such handover is up to implementation and is not part of this specification. However if session handover is used, then implementation MUST (or SHOULD?) ensure aliveness of the process handling particular session and restart session (or process or both) if problem detected. Note there is no restriction whether initial session setup is performed by single multiplexing process or by a process dedicated to particular AFI/SAFI or a group of thereof. Appendix A of this memo contains information on how session handover could be implemented, but actual implementation MAY choose different approach. Varlashkin Expires October 11, 2010 [Page 4] Internet-Draft multisession-same-ports April 2010 6. OPEN message handling Optional parameter related to multisession: The Capability Value field is defined by [RFC4760] as: 0 7 15 23 31 +-------+-------+-------+-------+ | AFI | Res. | SAFI | +-------+-------+-------+-------+ Figure1: Capability Value field Currently section 8 of RFC4760 defines Reserved field as: Res. - Reserved (8 bit) field. SHOULD be set to 0 by the sender and ignored by the receiver. Note that not setting the field value to 0 may create issues for a receiver not ignoring the field. In addition, this definition is problematic if it is ever attempted to redefine the field. This document proposes following modification: The highest order bit of Reserved field MUST be set to 1 if implementation supports multisession. Old implementation is expected to ignore value of this field and will set it to zero when sending its own OPEN message. If multisession-capable router detects that peer has old implementation (Reserved field set to zero), then multisession- capable router MUST revert to the old behaviour (i.e. single session only) by using connection collision detection procedure described in section 6.8 of [RFC4271]. Note reverting to old behaviour means multisession is not possible between given pair of BGP speakers. 7. BGP connection collision handling modification Modifications described in this section apply only when router operates as new implementation towards particular peer, otherwise original RFC4271 section 6.8 applies. If AFI/SAFI lists of OPEN messages do not overlap new session proceeds as another session wouldn't exist at all, i.e. new session is established. If AFI/SAFI lists of OPEN messages match exactly, original procedure from section 6.8 of [RFC4271] MUST be used. Varlashkin Expires October 11, 2010 [Page 5] Internet-Draft multisession-same-ports April 2010 If AFI/SAFI list of new incoming OPEN message is subset of an existing, even in established state, session (i.e. it overlaps but new list contains fewer AFI/SAFI), then behaviour is as follows: Alternative 1: receiving side drops incoming session and initiates new session for those AFI/SAFIs which are found in the just dropped session but not in the old one. Receiver of this session (i.e. originator of the just collided session) MAY accept this proposal or drop it. Receiver SHOULD NOT attempt to send OPEN message with the list that has caused original collision (this will cause unnecessary stress). As special case this scenario may result in keeping old session and blocking new one until operators intervention. Alternative 2: Receiving side drops incoming session and sends OPEN for combined AFI/SAFI for both existing and just dropped session. If originator of just collided session accepts this proposal, existing session is dropped but already exchanged prefixes are retained (possibly via graceful-restart). Alternative 3: Receiving side accepts the proposal, gracefully transfers overlapping AFI/SAFI from the old session to the new one (using graceful restart mechanism). While alternatives 2 and 3 may sound outlandish they in fact may become handy when operator needs to regroup AFI/SAFI as they avoid outage during the migration. If AFI/SAFI list of new incoming OPEN message is superset of an existing, even in established state, session (i.e. it overlaps but new list contains more AFI/SAFI) and receiving BGP speaker is configured to support all involved AFI/SAFI, then behaviour is as follows: Receiving side proceeds with new session, then gracefully transfers routing information from the old session to the new one and closes old connection. If AFI/SAFI list of new incoming OPEN message partially overlaps with the existing, even in established state, but it's neither subset nor superset, then behaviour is as follows: Alternative 1: Receiver accepts proposal, gracefully transfers overlapping AFI/SAFI from the old session to the new one (possibly via graceful-restart). Alternative 2: Receiver drops incoming session and instead proposes three separate session - first that has AFI/SAFI unique to existing session, second with AFI/SAFI overlapping between sessions and third Varlashkin Expires October 11, 2010 [Page 6] Internet-Draft multisession-same-ports April 2010 with AFI/SAFI unique to just dropped session. Alternative 3: Receiver drops incoming session. Originator of just dropped session SHOULD NOT make further attempts to establish session with the same AFI/SAFI list until operator's intervention. DISCUSSION: Choice of alternatives is open for discussion. Particularly whether single method should be chosen or a mechanism should be introduced as to inform which method is used. 8. AFI/SAFI grouping This specification does not explicitly define grouping capability. However this is not required to produce conforming implementation in order to achieve grouping as long as (S)AFI combinations in groups are not hardcoded. The grouping capability is implied by virtue of specifying more than one AFI/SAFI: if BGP speaker supports and configured to group particular set of AFI/SAFI then it simply advertises only given AFI/SAFI in OPEN message, and each session advertises different set of AFI/SAFI. In order to ensure interoperability a BGP implementation SHOULD NOT impose particular (S)AFI grouping at coding time, otherwise it SHOULD provide at least one-(S)AFI-per-session alternative. 9. Discussion This specification allows implementing per-AFI/SAFI or per-group BGP session using different host system processes without introduction of new TCP ports. Some parts of this specification describe several behaviours. It's a request to the community to discuss which of the alternatives should be adopted, and whether or not this specification should contain formal definition of BGP FSM changes in format used by [RFC4271]. This specification relies on availability of a mechanism to pass TCP session from one process to another. Many modern operating systems provide such functionality. This mechanism may be significantly different from system to system, but from protocol prospective they all allow achieving desired result. Author of this memo does not rule out possibility that a BGP implementation may exist that uses operating system without such facilities. In such cases it's believed that BGP implementer should work together with operating system implementers to organise required functionality, because such approach avoids penalising other protocol implementers and users. Varlashkin Expires October 11, 2010 [Page 7] Internet-Draft multisession-same-ports April 2010 Some network operators or BGP implementers may wish to restrict which AFI/SAFI are handled by specific process or how AFI/SAFI are grouped in multiple sessions. This specification allows achieving this without imposing restriction on protocol itself while permitting interoperability between implementations that impose restrictions and an implementation without restrictions. However if two implementations insist on non-matching grouping of AFI/SAFI then BGP session cannot be established. Therefore its recommended that AFI/ SAFI grouping is left for configuration-time. Request to the community is to review whether this poses practical problem and how alternative multisession implementations would handle such situation. 10. Security considerations The changes proposed in this document do not introduce any new security concerns to BGP itself. If BGP implementation involves handover of BGP session(s) between processes, interprocess communication is subject to security model and threats of the host operating system. 11. IANA Considerations The changes to BGP proposed by this memo do not require any new allocations from IANA. 12. Acknowledgments The author would like to thank Robert Raszuk for valuable comments and for help with translation of the document to XML format. 13. References 13.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, January 2007. Varlashkin Expires October 11, 2010 [Page 8] Internet-Draft multisession-same-ports April 2010 13.2. Informative References [I-D.ietf-idr-bgp-multisession] Scudder, J. and C. Appanna, "Multisession BGP", draft-ietf-idr-bgp-multisession-05 (work in progress), March 2010. [I-D.raszuk-ti-bgp] Raszuk, R. and K. Patel, "Transport Instance BGP", draft-raszuk-ti-bgp-01 (work in progress), March 2010. Appendix A. Example of session handover When BGP implementation runs on a host operating system that is similar to BSD, the session handover can use socket passing technique as described by W. Richard Stevens. Basic idea is to establish Unix-socket session between process that initially accepts or creates TCP connection and the process which will actually handle the connection. In respect to BGP implementation following approach could be used (though other may exist): 1. a master process is responsible for obtaining TCP connection with a peer (either actively attempting to establish the connection or listening for incoming connection on well-known BGP port) 2. when TCP connection is established, this master process exchanges OPEN messages with the peer and optionally handles connection collision detection 3. after determining which AFI/SAFI peer would like to exchange over this session, the master process sends specially formed message on a UNIX socket to target process to initiate socket handover 4. when recipient process acknowledged its readiness to take over, master process stops handling this BGP session 5. (optionally, but highly desirable) master and recipient process keep exchanging keep-alive messages over Unix-socket to ensure that BGP session is being handled. If recipient process dies or becomes malfunctioning master process can take action to revive affected BGP session. Varlashkin Expires October 11, 2010 [Page 9] Internet-Draft multisession-same-ports April 2010 Author's Address Ilya Varlashkin Easynet Global Services Harburger Schlossstr. 1 Hamburg, 21079 Germany Email: ilya.varlashkin@de.easynet.net Varlashkin Expires October 11, 2010 [Page 10]