MPLS, CCAMP, RSVP WGs E. Tempest Internet Draft S. Ballarte Document: draft-tempest-mpls-ccamp-rsvp-resync-00 Nortel Networks July 2001 RSVP-TE Failure Recovery (Resynchronization after Failure) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This draft describes a mechanism for recovery after control channel failure in RSVP-TE. The draft introduces a procedure for state re- synchronization between two neighboring LSRs. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. 3. Introduction Once an RSVP Hello adjacency has been (re-)established between two neighbouring LSRs, the after-effects of a re-synchronisation are largely dependent on the following factors: - Underlying cause for the Hello adjacency loss. - The behaviour of LSRs on reset with respect to LSPs that existed at the time of the reset. Tempest Expires January 2002 1 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 - The duration for which the Hello adjacency has been down coupled with the LSRÆs need to maintain previously created LSPs over this period. When an LSR is reset, one of the following may have occurred: 1. The device has retained all knowledge of LSPs that previously existed. 2. The device may have purged all knowledge of LSPs that previously existed û the device no longer has need of these LSPs and requires neighbouring LSRs to instigate the tearing down any such LSPs on its behalf. 3. The device may have purged all knowledge of LSPs that previously existed, but requires the LSPs be retained. The LSR must bootstrap its knowledge of LSPs from neighbouring LSRs. Case (3) above is not supported by the re-synchronisation procedure specified in this document, the reason being that all LSRs are assumed to require provisioning of some sorts, information that must be retained across resets. As such, LSRs possess the means to retain LSP information across resets if so required. By virtue of them not doing so implies that this information is not important, and that the LSPs should be taken down following a reset. For the purposes of texts within this document, LSR A and LSR B are neighbouring LSRs with one or more logical links between them, where a logical link is capable of supporting multiple LSPs. The Srefresh message is being proposed within this document as the mechanism to re-synchronise LSP information between neighbouring LSRs whenever the Hello adjacency between them is (re-)established. As such, LSRs are required to implement draft-ietf-mpls-rsvp-lsp- tunnel-08. There are some limitations on this mechanism though as illustrated by the following observations: - If LSR B has Path state for an LSP previously initiated by LSR A, but LSR A no longer has knowledge of this LSP, LSR A cannot bootstrap the required soft state information pertaining to this LSP from LSR B without additional RSVP messages being defined. - If LSR B has both Path & Resv soft state for an LSP, and LSR A is the destination of this LSP but has no knowledge of it, LSR A, although it can subsequently re-acquire the Path soft state information, will treat this as a new LSP set-up. As such, LSR A may select a totally different upstream label (in the case of a bi-directional LSP), and may specify a RESV_CONFIRM confirm object in the associated Resv message. The logic within LSR B associated with Resv message handling would have to be modified to take this case into account. Tempest Expires January 2002 2 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 - To cater for highly channelised and/or large numbers of logical links between LSR A and LSR B, the resynchronisation procedure must allow for the case where multiple Srefresh messages are required to effect the resynchronisation. 4. Resynchronization States The state of re-synchronisation between two neighbouring LSRs will fall into one of the following categories: - Administratively down: the Hello adjacency between the LSRs has been taken down administratively either following operator intervention, or automatically as a result of repeated failed attempts to re-synchronise following re-establishment of the Hello adjacency. Operator intervention is required to bring the Hello adjacency back up. If utilising RSVP Hellos as the Hello adjacency mechanism, RSVP Hellos may continue to be generated, but no processing must be performed on their receipt. - Down: the Hello adjacency is down due to a failure in the signaling network, and/or a failure in either LSR, and/or the restarting of either LSR. Whilst in this state, all RSVP messages other than RSVP Hello messages will be discarded û should they require acknowledgement, no ACK will be sent. - Up & re-synchronising: the Hello adjacency is up, and re- synchronisation between the LSRs is in progress. Whilst in this state, all RSVP messages other than RSVP Hello, Srefresh, and ACK/NACK messages will be discarded - should they require acknowledgement, no ACK will be sent. - Up & re-synchronised: the Hello adjacency is up, and re- synchronisation between the LSRs is complete. When an LSR restarts, the default re-synchronisation state to all neighbouring LSRs will be one of Administratively down (provisioned information must be retained across restarts) or Down. Figure 1 shows the various resynchronization states and their relationships. +-----------+ +-----------+ | Admin |--------->| Down | | Down |<---------| | +-----------+ +-----------+ ^ ^ ^ | ^ | \ / | | | \ / | | | \ / | | | \/ | | | /\ | | | / \ | | | / \ | | Tempest Expires January 2002 3 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 | / \ | | | / \ V | +-----------+ +-----------+_ | Up and |<---------| Up and | | resynched | |resynching | +-----------+ +-----------+ Figure 1: Resynchronization States 5. Hello Adjacency (RSVP Hello Messages) The purpose of the RSVP Hello messages is to determine whether an operational bi-directional channel exists through the signaling network between neighbouring LSRs. The successful exchange of RSVP Hello messages between the parties establishes the Hello adjacency. So long as the Hello adjacency exists, it is reasonable to assume that bi-directional transfer of other RSVP messages between the two parties is possible. An LSR will periodically generate RSVP Hello messages for each of its provisioned neighbouring LSRs, regardless of whether the LSRs are physically present in the network, or otherwise reachable. The following conditions are indicative of a Hello Adjacency failure, and will cause the re-synchronisation state to be set to Down: - An RSVP Hello message is received that conveys a Src_Instance value that is different than that received in the previous RSVP Hello message from the same neighbouring LSR. This is indicative that the sending party has reset, or has otherwise detected a loss in Hello adjacency. - An RSVP Hello message is received that conveys a Dst_Instance value of 0. This is indicative that the sending party has reset, or has otherwise detected a loss in Hello adjacency. - A timeout period has elapsed in which no acceptable RSVP Hello messages have been received. The periodicity of RSVP Hello messages, and the maximum number of consecutive RSVP Hello messages that can be dropped within the network before the Hello adjacency is lost, will default to (R/3) & K respectively (It is required that the lifetime for the Hello adjacency be substantially less, say 3 times less, than the lifetime associated with "soft" state information), where R and K are the default values for the standard RSVP refresh mechanism. When the Hello adjacency lifetime expires, an RSVP Hello message must be sent with a Src_Instance value different than that specified in the previous RSVP Hello message sent to the same neighbouring entity (note that 0 is an invalid Src_Instance value). 5.1 Hello Adjacency Down Behavior Tempest Expires January 2002 4 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 When the Hello adjacency goes down, LSRs A & B alike will use this as a trigger to: - Freeze ôsoftö state information (i.e. freeze all related timers (lifetime & refresh)) associated with LSPs that traverse between the two LSRs. - Tear down any LSPs that are still in the process of being established and that traverse between the two LSRs. The re-synchronisation state will be set to Down in both LSRs. In the case where the reason for the Hello adjacency loss is an LSR reset, the LSR may not have had the opportunity to perform either of the above. 5.2 Hello Adjacency Up Behavior When the Hello adjacency comes up, LSRs A & B will use this as a trigger to commence the re-synchronisation phase, and the re- synchronisation state will be set to Up & re-synchronising accordingly on both LSRs. 6. Re-Synchronization Procedure LSRs A & B must participate in the re-synchronisation procedure via the generation of a linked sequence of one or more Srefresh messages. LSRs will only generate re-synchronisation related 1 Srefresh messages for Path state . During the re-synchronisation procedure, any requests to set-up and/or tear down LSPs traversing the two LSRs will be ignored û if these requests require acknowledgement, no acknowledgement will be sent. The re- synchronisation procedure between LSRs A & B is deemed to be complete once all of the following conditions have been met: - LSR A has received an ACK for the Srefresh message that it sent to LSR B with the linked sequence flag not set. - LSR B has received an ACK for the Srefresh message that it sent to LSR A with the linked sequence flag not set. All re-synchronization related Srefresh messages must be acknowledged. Because NACK messages are subject to loss, NACK messages generated as a result of processing an Srefresh message must also be acknowledged. The ACK for the Srefresh message must not be generated until all the NACK messages generated as a result of processing an Srefresh message have been sent and acknowledged. If an LSR does not receive an acknowledgement to a Srefresh message that it previously sent within a configurable timeframe, the Srefresh message must be resent. If after a configurable number of such retries no acknowledgement is forthcoming, the re- synchronisation procedure should be aborted, an appropriate alarm Tempest Expires January 2002 5 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 generated requiring operator intervention, and the re- synchronisation state set to Administratively down. To support the concept of linked Srefresh messages, a couple of minor deviations from draft-ietf-rsvp-refresh-reduct-05.txt will be required as follows: - Add a flag to MESSAGE_ID_LIST object called linked sequence. Within a sequence of two or more re-synchronization related Srefresh messages generated by an LSR, all Srefresh messages in the sequence, besides the last one, must have the linked sequence flag set, and the last one must have this flag not set. - The MESSAGE_ID_LIST may contain zero entries. This is necessary to model the case where the LSR sending the re-synchronization related Srefresh message has no LSP knowledge. LSR A, on receipt of a re-synchronisation related Srefresh message from LSR B, will: - Generate a NACK for each message id that it does not recognise. The NACK requires acknowledgement. The maximum number of NACKs that may be outstanding at any given time awaiting acknowledgement must be configurable, with a default value of 5. - Once all the NACKs have been sent and acknowledged, generate an ACK for the Srefresh message itself. - On receipt of, and completed processing of, the acknowledgement for the re-synchronisation related Srefresh message with the linked sequence flag not set previously sent by LSR A, locally purge all LSPs for which LSR A has knowledge, but LSR B does not. Generate the necessary PathTear messages downstream, and ResvTear or PathErr (with PATH_STATE_REMOVED flag set) messages upstream (provided that the state of Hello adjacency with upstream/downstream is Up & resynchronizing). - On receipt of, and completed processing of, the re-synchronisation related Srefresh message with the linked sequence flag not set sent by LSR B, locally purge all LSPs for which LSR B has knowledge, but LSR A does not. Generate the necessary PathTear messages downstream, and ResvTear or PathErr (with PATH_STATE_REMOVED flag set) messages upstream (provided that the state of Hello adjacency with upstream/downstream is Up & resynchronized). - On having sent an acknowledgement to the re-synchronisation related Srefresh message with the linked sequence flag not set previously sent by LSR B, and having received an acknowledgement for the re-synchronisation related Srefresh message with the linked sequence flag not set previously sent by LSR A,: o Unfreeze all ôsoftö state timers. o Set the re-synchronisation state to Up & re-synchronised. Tempest Expires January 2002 6 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 6.1 Message Sequences LSR A | | LSR B | | | | | |<--- Hello Adjacency |<--------------------| Down <----| RSVP Hello message| resynch state Down | Src_Instance is | Hello Adjacency | different to that | Down | specified in the | | last Hello | | | Figure 2: Notification of Hello Adjacency Loss LSR A | | LSR B <----| |----> Hello adjacency | Srefresh (LS=1) | Hello adjacency Up -> re-synch |-------------------->| Up -> re-synch state State Up and | | Up and resynching resynching |<====NACKs===========| | | |=====ACKs(to NACK)==>| | | |<--------------------| | Ack for | | Srefresh | | | | Srefresh (LS=0) | |-------------------->| | | |<====NACKs===========| | | |=====ACKs(to NACK)==>| | | |<--------------------| | Ack for | | Srefresh | | | | | |<---Srefresh (LS=1)--| | | |====NACKs===========>| | | |<====ACKs(to NACK)===| | | |-------------------->|---> LSPs known by LKSR A <---| Ack for | not LSR B deleted LSPs known by | Srefresh | locally. Generate Tempest Expires January 2002 7 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 LSR A, but not B | | PathTear downstream, detectedLocally. | | and ResvTear or Generate | | PathErr upstream PathTear down- | | Stream and | | RsvTear or | | PathErr upstream | | | | | | |<---Srefresh (LS=0)--| | | |====NACKs===========>| | | |<====ACKs(to NACK)===| | | |-------------------->|---> LSPs known by LSR B <---| Ack for | not LSR A deleted LSPs known by | Srefresh | locally. Generate LSR B, but not A | | PathTear downstream, Detected locally.| | and ResvTear or Generate | | PathErr upstream PathTear down- | | Unfreeze soft state Stream and | | timers and set RsvTear or | | state to resynched PathErr upstream | | Unfreeze soft State timers and Set state to Up and resynched Figure 3: Successful Re-Synchronization Scearios LSR A | | LSR B <----| |----> Hello adjacency | Srefresh (LS=1) | Hello adjacency Up -> re-synch |-------------------->| Up -> re-synch state State Up and | | Up and resynching resynching |<====NACKs===========| | | |=====ACKs(to NACK)==>| | | | X<--------| | Ack for | | Srefresh | | | Time Out ---->| | |----Srefresh (LS=1)->| | | |<====NACKs===========| | | |=====ACKs(to NACK)==>| | | | X<--------| Tempest Expires January 2002 8 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 | Ack for | | Srefresh | | | Time Out ---->| | | | Figure 4: Repeated Re-Synchronization Failure Scenario 7. Security Considerations This draft doesn't introduce any new security issues. 8. Acknowledgments The authors would like to acknowledge Fong Liaw and Dimitrios Pendarakis for helpful discussions. 9. Author's Addresses Ewart Tempest Nortel Networks P.O. Box 3511, Station C Ottawa, Ontario, Canada K1Y-4H7 Phone: 613-768-0610 Email: ewart@nortelnetworks.com Sandra Ballarte Nortel Networks P.O. Box 3511, Station C Ottawa, Ontario, Canada K1Y-4H7 Phone: 613-763-9510 Email: ballarte@nortelnetworks.com Tempest Expires January 2002 9 draft-tempest-mpls-ccamp-rsvp-resync-00 July 2001 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 Tempest Expires January 2002 10