Internet DRAFT - draft-ravisingh-teas-rsvp-setup-retry

draft-ravisingh-teas-rsvp-setup-retry



 TEAS Working Group                                           Ravi Singh 
 Internet Draft                                         Juniper Networks 
 Intended status: Best Current Practice                       Rob Shakir 
                                                         British Telecom 
                                                     Vishnu Pavan Beeram 
                                                        Juniper Networks 
                                                              Tarek Saad 
                                                           Cisco Systems 
  
 Expires: January 2, 2016                                   July 2, 2015 
                                     
  
                           RSVP Setup Retry - BCP 
                  draft-ravisingh-teas-rsvp-setup-retry-01 


 Status of this Memo 

    This Internet-Draft is submitted in full conformance with the 
    provisions of BCP 78 and BCP 79. 
     
    Internet-Drafts are working documents of the Internet Engineering 
    Task Force (IETF), its areas, and its working groups.  Note that 
    other groups may also distribute working documents as Internet-
    Drafts. 
     
    Internet-Drafts are draft documents valid for a maximum of six 
    months and may be updated, replaced, or obsoleted by other documents 
    at any time.  It is inappropriate to use Internet-Drafts as 
    reference material or to cite them other than as "work in progress." 
     
    The list of current Internet-Drafts can be accessed at 
    http://www.ietf.org/ietf/1id-abstracts.txt 
     
    The list of Internet-Draft Shadow Directories can be accessed at 
    http://www.ietf.org/shadow.html 
     
    This Internet-Draft will expire on January 2, 2016. 
     
 Copyright Notice 

    Copyright (c) 2015 IETF Trust and the persons identified as the 
    document authors. All rights reserved.  
     
    This document is subject to BCP 78 and the IETF Trust's Legal 
    Provisions Relating to IETF Documents  
    (http://trustee.ietf.org/license-info) in effect on the date of 
    publication of this document. Please review these documents 
    carefully, as they describe your rights and restrictions with 
  
  
  
  
 Ravi Singh             Expires January 2, 2016                 [Page 1] 
  






 Internet-Draft             RSVP Setup Retry                   July 2015 
     

    respect to this document.  Code Components extracted from this 
    document must include Simplified BSD License text as described in 
    Section 4.e of the Trust Legal Provisions and are provided without 
    warranty as described in the Simplified BSD License. 
      
 Abstract 

    This document discusses the best current practices associated with 
    the implementation of RSVP setup-retry timer.  

 Conventions used in this document 

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
    document are to be interpreted as described in RFC-2119 [RFC2119]. 
     

 Table of Contents 

    1. Introduction...................................................2 
    2. Setup-Retry Timer..............................................3 
    3. Possible ill-effects due to implementation choices.............3 
    4. Causes of the above ill-effects................................5 
    5. Solution to the implementation issues..........................5 
    6. Security Considerations........................................6 
    7. IANA Considerations............................................6 
    8. Normative References...........................................6 
    9. Acknowledgments................................................6 
    10. Authors' Addresses............................................6 
    Contributors......................................................7 
     
 1. Introduction 

    In an RSVP-TE network with a very large number of LSPs, link/node 
    failure(s) may produce a noticeable increase in RSVP-TE control 
    traffic. As a result, RSVP-TE messages might get delayed by virtue 
    of being stuck in a queue that is overwhelmed with messages to be 
    sent or they might get lost forever. For example, a Path message 
    intended to be sent by a transit router might be stuck in the output 
    queue to be sent to the next-hop. Alternately, it might have got 
    dropped on the receive side due to queue overflows. The same could 
    happen for a Resv message in the reverse direction. Also, in the 
    absence of reliable delivery of Path-Error messages [RFC2961], an 
    error that gets generated at transit/egress for an LSP that is in 
    the process of being setup may never make it to the ingress. 
 
  
  
 Ravi Singh             Expires January 2, 2016                 [Page 2] 
     






 Internet-Draft             RSVP Setup Retry                   July 2015 
     

    Lost/delayed RSVP-TE messages cause the following problems for an 
    ingress router: 
    - In the absence of an error indication, how is an ingress to know 
      that an LSP for which signaling was (re-)initiated and a Resv has 
      not yet been received, is ever going to come up? 
    - In the absence of any indication, what action should the ingress 
      take to support low-latency LSP-setup? 
     
    The above problems essentially boil-down to: how long should the 
    ingress continue to wait before giving up on its attempt to bring up 
    the LSP, and take some alternative course of action (e.g., try to 
    bring up the LSP on an alternate path)?. To mitigate this problem, 
    some implementations use a setup-retry timer mechanism. This 
    document discusses the issues associated with a particular 
    implementation of this timer and makes some specific recommendations 
    to get around these issues.  
     
 2. Setup-Retry Timer 

    The setup-retry timer is usually a configurable timer which (in the 
    absence of an error indication) goes off when an LSP with a given 
    LSPID has not received the corresponding Resv in response to its 
    Path during a pre-configured duration after its first Path had been 
    sent. 
     
    Use of the setup-retry timer is based on the presumption that if 
    signaling for a given LSP has not been completed within an 
    "expected" duration, it is not going to be completed at all. The 
    intent in the use of this timer is to expeditiously take some 
    alternative course of action when an LSP has not yet completed its 
    signaling within an "expected" duration of time. 
     
 3. Possible ill-effects due to implementation choices 

    As mentioned in the previous section, the intent in the use of this 
    timer is to take some alternative course of action when an LSP has 
    not yet completed its signaling within an "expected" duration of 
    time. One such course of action is for the ingress router to 
    initiate tear-down for the previously in-the-process-of-being-
    signaled path via a PathTear; run CSPF; and use the outcome of this 
    CSPF to signal the brand-new path for this tunnel with a different 
    LSP-ID, typically, bumped up by 1. This section describes the 
    problems caused by such course of action. 
  
    As mentioned in Section 1, in a network with a very large number of 
    RSVP-TE LSPs, link/node failure(s) may produce a noticeable increase 
  
  
 Ravi Singh             Expires January 2, 2016                 [Page 3] 
     






 Internet-Draft             RSVP Setup Retry                   July 2015 
     

    in the volume of RSVP-TE control traffic, which in turn might cause 
    a router to either drop RSVP-TE messages or alternately cause them 
    to be sent excessively late. 
     
    As a result, the following problems can occur: 
    - LSP setup latency might be excessively high. 
    - Error messages that indicate failure in LSP setup might not make 
      it to the ingress router. 
     
    A mix of the above problems can cause the setup-retry timer for a 
    given LSP (at the ingress router) to fire repeatedly over a period 
    of time. The situation being such the ingress gets stuck in a cycle 
    as illustrated below for some/many LSPs: 
     
    -------------------------------------------------------------------- 
    Ingress Timeline        | [Ingress]---[]---[]...[Transit]...[]---[]- 
    ------------------------| 
    1. Trigger LSP setup    | Path 
              :             |   TNL-ID=X 
              :             |   LSP-ID=Y 
              :             | --------> 
       <No Resv (X, Y)>     |          ------------> Path (X, Y) 
              :             |                        -------> --------->   
              :             |                  :          
              :             |                  :       
    2. Setup-Retry Timer    |                  : 
       fires; Recompute     |                  : 
       path;                |                  : 
    3. Trigger Teardown     | PathTear 
                            |   TNL-ID=X 
                            |   LSP-ID=Y 
                            | --------> 
                            |          ------------> PathTear (X, Y) 
                            |                        -------> --------->    
    4. Trigger setup for new| Path 
       instance of the LSP  |   TNL-ID=X 
       (same ERO)           |   LSP-ID=Y+1 
              :             | --------> 
              :             |          ------------> Path (X, Y+1) 
              :             |                        -------> ---------> 
              :             |                                 Resv 
       <No Resv (X, Y+1)>   |                                   TNL-ID=X 
              :             |                                   LSP-ID=Y 
              :             |                                 <--------- 
              :             |                                 ResvError 
              :             |                                   No Path    
  
   
 Ravi Singh             Expires January 2, 2016                 [Page 4] 
     






 Internet-Draft             RSVP Setup Retry                   July 2015 
     

              :             |                                 ---------> 
    5. Repeat loop through  |                  :               
       2-4                  |                  : 
    -------------------------------------------------------------------- 

    In the above illustration, notice how the transit router never gets 
    to completely process the "current" LSP-ID (see [RShakir] for more). 
    The implementation recommendations made in this document will help 
    avoid this snowball effect. 

 4. Causes of the above ill-effects 

    The implementation issues listed in section 3 end up causing an 
    increase in the control plane load on a network whose control plane 
    is already under stress. The foregoing is caused by unnecessarily 
    doing the following even when there is no change in the computed 
    path: 
     
    - Sending PathTears causes excessive and unjustifiable work on those 
      downstream routers on the "previous ERO path" that had managed to 
      bring the LSP UP. In other words, the slowness of a given transit 
      router should not be the cause to penalize all other transit 
      routers downstream of it, as doing so just increases the overall 
      network stress. 
       
    - Sending Path for LSPID=Y+1 causes unnecessary work for all routers 
      on the ERO path including those that were already running slow and 
      were the real cause of the Resv for LSPDID=Y not having been 
      received timely by the ingress. 
  
 5. Solution to the implementation issues 

    To eliminate causes of the ill-effects listed in the previous 
    section and thus to eliminate the ill-effects, this document makes 
    the following recommendations. 

    When the setup-retry timer fires: 

    If there is no change in the computed path (no error indication for 
    that LSP has been received via a PathErr or a TE update indicating a 
    failure), 
    - Do not send PathTear for LSPID=Y 
    - Just let the Path State get refreshed for LSPID=Y.  
       
    The recommended default behavior is to keep retrying until the path 
    changes or the user intervenes. Implementations MAY choose to 
  
   
 Ravi Singh             Expires January 2, 2016                 [Page 5] 
     






 Internet-Draft             RSVP Setup Retry                   July 2015 
     

    provide the user with an option to override this default behavior 
    and specify a policy to determine when to stop retrying.  
      
    Implementations SHOULD use the recommendations listed in this 
    section to avoid getting stuck in a LSP signaling hysteresis. 
  
 6. Security Considerations 

    This document does not introduce any new security concerns. 

 7. IANA Considerations 

    None. 

 8. Normative References 

    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 
              Requirement Levels", BCP 14, RFC 2119, March 1997. 
     
    [RShakir] Rob Shakir, "The next spring forward",  
             http://rob.sh/files/the-next-spring-forward_rjs120314.pdf  
             March 2014. 
     
    [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions", 
             RFC 2961, April 2001. 

 9. Acknowledgments 
     
    The authors would like to thank Yakov Rekhter and Raveendra Torvi 
    for their inputs. 
     
 10. Authors' Addresses 

    Ravi Singh 
    Juniper Networks 
    Email: ravis@juniper.net 
     
    Rob Shakir 
    British Telecom 
    Email: rob.shakir@bt.com 
     
    Tarek Saad 
    Cisco Systems 
    Email: tsaad@cisco.com 
     
    Vishnu Pavan Beeram 
  
   
 Ravi Singh             Expires January 2, 2016                 [Page 6] 
     






 Internet-Draft             RSVP Setup Retry                   July 2015 
     

    Juniper Networks 
    Email: vbeeram@juniper.net 
     
 Contributors 

    Markus Jork 
    Juniper Networks 
    Email: mjork@juniper.net 
     
    Aman Kapoor 
    Juniper Networks 
    Email: amanka@juniper.net 
     
  






























  
  
  
 Ravi Singh             Expires January 2, 2016                 [Page 7]