Network Working Group Vishnu Pavan Beeram Internet Draft Juniper Networks Intended status: Informational Ina Minei Google, Inc Yakov Rekhter Juniper Networks Ebben Aries Facebook Dante Pacella Verizon Expires: September 07, 2015 March 07, 2015 RSVP-TE Scalability - Recommendations draft-beeram-mpls-rsvp-te-scaling-00 Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on September 07, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Beeram, et al Expires September 07, 2015 [Page 1] Internet-Draft Network Assigned Upstream Label March 2015 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Abstract RSVP-TE [RFC3209] describes the use of standard RSVP [RFC2205] to establish Label Switched Paths (LSPs). As such, RSVP-TE inherited some properties of RSVP that adversely affect its control plane scalability. Specifically these properties are (a) reliance on periodic refreshes for state synchronization between RSVP neighbors and for recovery from lost RSVP messages, (b) reliance on refresh timeout for stale state cleanup, and (c) lack of any mechanisms by which a receiver of RSVP messages can apply back pressure to the sender(s) of these messages. Subsequent to [RFC2205] and [RFC3209] further enhancements to RSVP and RSVP-TE have been developed. In this document we describe how an implementation of RSVP-TE can use these enhancements to address the above mentioned properties to improve RSVP-TE control plane scalability. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. Table of Contents 1. Introduction...................................................3 1.1. Reliance on refreshes and refresh timeouts................3 1.2. Lack of back pressure.....................................4 2. Recommendations................................................5 2.1. Eliminating reliance on refreshes and refresh timeouts....5 2.2. Providing the ability to apply back pressure..............6 2.3. Making Acknowledgements mandatory.........................6 2.4. Clarifications on reaching Rapid Retry Limit (Rl).........7 2.5. Avoiding use of Router Alert IP Option....................7 2.6. Checking Data Plane readiness.............................8 3. Security Considerations........................................8 Beeram, et al Expires September 07, 2015 [Page 2] Internet-Draft Network Assigned Upstream Label March 2015 4. IANA Considerations............................................8 5. Normative References...........................................8 6. Acknowledgments................................................9 1. Introduction RSVP-TE [RFC3209] describes the use of standard RSVP [RFC2205] to establish Label Switched Paths (LSPs). As such, RSVP-TE inherited some properties of RSVP that adversely affect its control plane scalability. Specifically these properties are (a) reliance on periodic refreshes for state synchronization between RSVP neighbors and for recovery from lost RSVP messages, (b) reliance on refresh timeout for stale state cleanup, and (c) lack of any mechanisms by which a receiver of RSVP messages can apply back pressure to the sender(s) of these messages. The following elaborates on this. 1.1. Reliance on refreshes and refresh timeouts Standard RSVP [RFC2205] maintains state via the generation of RSVP Path/Resv refresh messages. Refresh messages are used to both synchronize state between RSVP neighbors and to recover from lost RSVP messages. The use of Refresh messages to cover many possible failures has resulted in two operational problems. The first relates to scaling, the second relates to the reliability and latency of RSVP signaling. The scaling problem is linked to the control plane resource requirements of running RSVP-TE. The resource requirements increase proportionally with the number of LSPs established by RSVP-TE. Each such LSP requires the generation, transmission, reception and processing of RSVP Path and Resv messages per refresh period. Supporting a large number of LSPs and the corresponding volume of refresh messages, presents a scaling problem for the RSVP-TE control plane. The reliability and latency problem occurs when a triggered (non- refresh) RSVP message such as Path, Resv, or PathTear is lost in transmission. Standard RSVP [RFC2205] recovers from a lost message via RSVP refresh messages. In the face of transmission loss of RSVP messages, the end-to-end latency of RSVP signaling, and thus the end-to-end latency of RSVP-TE signaled LSP establishment, is tied to the refresh interval of the Label Switch Router(s) experiencing the loss. When end-to-end signaling is limited by the refresh interval, the delay incurred in the establishment or the change of an RSVP-TE signaled LSP may be beyond the range of what is acceptable in practice. This is because RSVP-TE ultimately controls establishment Beeram, et al Expires September 07, 2015 [Page 3] Internet-Draft Network Assigned Upstream Label March 2015 of the forwarding state required to realize RSVP-TE signaled LSPs. Thus delay incurred in the establishment or the change of such LSPs results in delaying the data plane convergence, which in turn adversely impacts the services that rely on the data plane. One way to address the scaling problem caused by the refresh volume is to increase the refresh period, "R" as defined in Section 3.7 of [RFC2205]. Increasing the value of R provides linear improvement on RSVP-TE signaling overhead, but at the cost of increasing the time it takes to synchronize state. For the reasons mentioned in the previous paragraph, in the context of RSVP-TE signaled LSPs, increasing the time to synchronize state is not an acceptable option. One way to address the reliability and latency of RSVP signaling is to decrease the refresh period R. Decreasing the value of R increases the probability that state will be installed in the face of message loss, but at the cost of increasing refresh message rate and associated processing requirements, which in turn adversely affects RSVP-TE control plane scalability. An additional problem is the time to clean up the stale state after a tear message is lost. RSVP does not retransmit ResvTear or PathTear messages. If the sole tear message transmitted is lost, the stale state will only be cleaned up once the refresh timeout has expired. This may result in resources associated with the stale state being allocated for an unnecessary period of time. Note that even when the refresh period is adjusted, the refresh timeout must still expire since tear messages are not retransmitted. Decreasing the refresh timeout by decreasing the refresh interval will speed up timely stale state cleanup, but at the cost of increasing refresh message rate, which in turn adversely affects RSVP-TE control plane scalability. 1.2. Lack of back pressure In standard RSVP, an RSVP speaker sends RSVP messages to a peer with no regard for whether the peer's RSVP control plane is busy. There is no control plane mechanism by which an RSVP speaker may apply back pressure to the peer by asking the peer to reduce the rate of RSVP messages that the peer sends to the speaker. RSVP-TE inherited this from standard RSVP. Lack of such a mechanism could result in RSVP-TE control plane congestion. RSVP-TE control plane is especially susceptible to congestion during link/node failures, as such failures produce bursts of RSVP-TE Beeram, et al Expires September 07, 2015 [Page 4] Internet-Draft Network Assigned Upstream Label March 2015 messages: Path/Resv for re-routing LSPs affected by the failures, Path/Resv for setup of new backup LSPs (as required by RSVP-TE Fast Reroute [RFC4090]), Tear/Error messages for the affected LSPs. Note that the load on the RSVP-TE control plane caused by these bursts is in addition to the load due to the periodic refreshes of Path/Resv messages for the LSPs not affected by the failures. RSVP-TE control plane congestion may result in loss of RSVP messages, which in turn have detrimental effects on the overall system behavior. Path/Resv refreshes lost by a peer's busy control plane will cause refresh timeout for some or all of its existing RSVP-TE state on the peer, thus inadvertently deleting existing LSPs and disrupting traffic carried over these LSPs. Triggered Path/Resv lost by a peer's busy control plane may result in failure to establish new backup LSPs used by RSVP-TE Fast Reroute [RFC4090] before the state for the corresponding protected primary LSPs times out, thus defeating the whole purpose of RSVP-TE Fast Reroute. 2. Recommendations Subsequent to the publication of [RFC2205] and [RFC3209] further enhancements to RSVP and RSVP-TE have been developed. In this section we describe how these enhancements could be used to address the problems listed in Section 1. 2.1. Eliminating reliance on refreshes and refresh timeouts To eliminate reliance on refreshes for both state synchronization between RSVP neighbors and for recovery from lost RSVP messages, as well as to address both the refresh volume and the reliability issues with RSVP mechanisms other than adjusting refresh rate, this document RECOMMENDS the following: - Implement reliable delivery of Path/Resv messages using the procedures specified in [RFC2961]. - Indicate support for RSVP Refresh Overhead Reduction Extensions (as specified in Section 2 of [RFC2961] by default, with the ability to override the default via configuration. - Make the value of the refresh interval configurable with the default value of 20 minutes. To eliminate reliance on refresh timeouts, in addition to the above, this document RECOMMENDS the following: Beeram, et al Expires September 07, 2015 [Page 5] Internet-Draft Network Assigned Upstream Label March 2015 - Implement reliable delivery of Tear/Err messages using the procedures specified in [RFC2961] - Implement coupling the state of individual LSPs with the state of the corresponding RSVP-TE signaling adjacency. When an RSVP-TE speaker detects RSVP-TE signaling adjacency failure, the speaker MUST clean up the LSP state for all LSPs affected by the failed adjacency. The LSP state is the combination of "path state" maintained as Path State Block and "reservation state" maintained as Reservation State Block (see Section 2.1 of [RFC2205]). - Use of Node-ID based Hello session ([RFC3209], [RFC4558]) for detection of RSVP-TE signaling adjacency failures. Make the value of the node hello_interval [RFC3209] configurable; increase the default value from 5 ms (as specified in Section 5.3 of [RFC3209]) to 9 seconds. - Implement procedures specified in [draft-chandra-mpls-enhanced- frr-bypass] which describes methods to facilitate FRR that works independently of the refresh-interval. 2.2. Providing the ability to apply back pressure To provide an RSVP speaker with the ability to apply back pressure to its peer(s) to reduce/eliminate RSVP-TE control plane congestion, in addition to the above, this document RECOMMENDS the following: - Use lack of ACKs from a peer as an indication of peer's RSVP-TE control plane congestion, in which case the local system SHOULD throttle RSVP-TE messages to the affected peer. This has to be done on a per-peer basis. - Retransmit of all RSVP-TE messages using exponential backoff, as specified in Section 6 of [RFC2961]. - Increase the Retry Limit (Rl), as defined in Section 6.2 of [RFC2961], from 3 to 7. - Prioritize Tear/Error over trigger Path/Resv sent to a peer when the local system detects RSVP-TE control plane congestion in the peer. 2.3. Making Acknowledgements mandatory The reliable message delivery mechanism specified in [RFC2961] states that "Nodes receiving a non-out of order message containing a Beeram, et al Expires September 07, 2015 [Page 6] Internet-Draft Network Assigned Upstream Label March 2015 MESSAGE_ID object with the ACK_Desired flag set, SHOULD respond with a MESSAGE_ID_ACK object." To improve predictability of the system in terms of reliable message delivery this document RECOMMENDS that nodes receiving a non-out of order message containing a MESSAGE_ID object with the ACK_Desired flag set, MUST respond with a MESSAGE_ID_ACK object. 2.4. Clarifications on reaching Rapid Retry Limit (Rl) According to section 6 of [RFC2961] "The staged retransmission will continue until either an appropriate MESSAGE_ID_ACK object is received, or the rapid retry limit, Rl, has been reached." The following clarifies what actions, if any, a router should take once Rl has been reached. If it is the retransmission of Tear/Err messages and Rl has been reached, the router need not take any further actions. If it is the retransmission of Path/Resv messages and Rl has been reached, then the router starts periodic retransmission of these messages every 30 seconds. The retransmitted messages MUST carry MESSAGE_ID object with ACK_Desired flag set. This periodic retransmission SHOULD continue until an appropriate MESSAGE_ID ACK object is received indicating acknowledgement of the (retransmitted) Path/Resv message. 2.5. Avoiding use of Router Alert IP Option In RSVP-TE the Path message is carried in an IP packet that is addressed to the tail end of the LSP that is signaled using this message. To make all the intermediate/transit LSRs process this message, the IP packet carrying the message includes the Router Alert IP option. The same applies to the PathTear message. An alternative to relying on the Router Alert IP option is to carry the Path or PathTear message as a sub-message of a Bundle message [RFC2961], as Bundle messages are "addressed directly to RSVP neighbors" and "SHOULD NOT be sent with the Router Alert IP option in their IP headers" [RFC2961]. Notice that since a Bundle message could contain only a single sub-message, this approach could be used to send just a single Path or PathTear message. This document RECOMMENDS implementing support for Bundle messages [RFC2961], and carrying Path and PathTear message(s) as sub-message(s) of a Bundle message. Beeram, et al Expires September 07, 2015 [Page 7] Internet-Draft Network Assigned Upstream Label March 2015 2.6. Checking Data Plane readiness In certain scenarios, like Make-Before-Break (MBB), a router needs to move traffic from an existing LSP to a new LSP in the least disruptive fashion. To accomplish this the data plane of the new LSP must be operational before the router moves the traffic. A possible mechanism by which the router can determine whether the data plane of the new LSP is operational is specified in [draft- bonica-mpls-self-ping]. This document RECOMMENDS implementing this mechanism and using it whenever the ingress of an LSP needs to check whether the data plane of the LSP is operational. 3. Security Considerations This document does not introduce new security issues. The security considerations pertaining to the original RSVP protocol [RFC2205] and RSVP-TE [RFC3209] remain relevant. 4. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC 5. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2205] Braden, R., "Resource Reservation Protocol (RSVP)", RFC 2205, September 1997. [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions", RFC 2961, April 2001. [RFC3209] Awduche, D., "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, December 2001. [RFC4090] Pan, P., "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 2005. [RFC4558] Ali, Z., "Node-ID Based Resource Reservation (RSVP) Hello: A Clarification Statement", RFC 4558, June 2006. Beeram, et al Expires September 07, 2015 [Page 8] Internet-Draft Network Assigned Upstream Label March 2015 [draft-bonica-mpls-self-ping] Ron Bonica, et al., "LSP Self-Ping", draft-bonica-mpls-self-ping, (work in progress) [draft-chandra-mpls-enhanced-frr-bypass] Chandra Ramachandran, et al., "Refresh Interval Independent FRR Facility Protection", draft-chandra-mpls-enhanced-frr-bypass, (work in progress) 6. Acknowledgments Most of the text in Section 1.1 has been taken almost verbatim from [RFC2961]. Authors' Addresses Vishnu Pavan Beeram Juniper Networks Email: vbeeram@juniper.net Ina Minei Google, Inc Email: inaminei@google.com Yakov Rekhter Juniper Networks Email: yakov@juniper.net Ebben Aries Facebook Email: exa@fb.com Dante Pacella Verizon Email: dante.j.pacella@verizon.com Markus Jork Juniper Networks Email: mjork@juniper.net Beeram, et al Expires September 07, 2015 [Page 9]