Network Working Group                                         Vipin Jain
Internet-Draft                                       Riverstone Networks
Category: Standards Track                                         Editor
Expires April 2006                                          October 2005


                Fail Over extensions for L2TP "failover"
                   draft-ietf-l2tpext-failover-06.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Copyright Notice
   Copyright (C) The Internet Society (2005).  All Rights Reserved.

Abstract

   L2TP is a connection-oriented protocol that has shared state between
   active endpoints. Some of this shared state is vital for operation
   but may be rather volatile in nature, such as packet sequence numbers
   used on the L2TP Control Connection. When failure of one side of a
   control connection occurs, a new control connection is created and
   associated with the old connection by exchanging information about
   the old connection. Such a mechanism is not intended as a replacement
   for an active fail over with some mirrored connection states, but as
   an aid just for those parameters that are particularly difficult to
   have immediately available. Protocol extensions to L2TP defined in
   this document are intended to facilitate state recovery, providing
   additional resiliency in an L2TP network and improving a remote
   system's layer 2 connectivity.


Jain, et al.                 Standards Track                    [Page 1]

INTERNET DRAFT                  FAILOVER                      April 2006


   Table of Contents

   Status of this Memo..........................................    1
   1.0 Introduction.............................................    3
   2.0 Protocol Operation.......................................    4
      2.1 Pre Failover Operation................................    4
      2.2 Failover Recovery Procedure...........................    5
         2.2.1 Recovery tunnel establishment....................    5
         2.2.2 Control and/or Data Channel Reset................    8
      2.3 Session State Synchronization.........................    9
   3.0 IANA Considerations......................................   11
   4.0 Security Considerations..................................   12
   5.0 Acknowledgements.........................................   12
   6.0 Author Information.......................................   12
   7.0 References...............................................   13
   8.0 Intellectual Property Statement..........................   13
   9.0 Disclaimer of Validity...................................   13
   10.0 Copyright Statement.....................................   13
   Appendix A...................................................   14
   Appendix B...................................................   15
   Appendix C...................................................   17
   Appendix D...................................................   18


Contributors

   Following is the list of contributors to this document.

   Paul Howard            Juniper Networks
   Vipin Jain             Riverstone Networks
   Sam Henderson          Cisco Systems
   Keyur Parikh           Harris Communications


Terminology

   Endpoint: L2TP control connection endpoint i.e. either LAC or LNS.
   Also known as LCCE in [L2TPv3]

   Active Endpoint: An endpoint that is currently providing service.

   Backup Endpoint: A redundant endpoint standing by for the active
   endpoint.

   Failover: The action of a Backup Endpoint taking over the service of
   an active endpoint. This could be due to administrative action or
   failure of the active endpoint.


Jain, et al.                 Standards Track                    [Page 2]

INTERNET DRAFT                  FAILOVER                      April 2006


   Old Tunnel: A control connection that existed before failure and is
   subjected to recovery upon failover.

   Recovery Tunnel: A new control connection established only to recover
   an old tunnel.

   Recovered tunnel: After an Old Tunnel is recovered (i.e. control
   connection and its sessions are restored) using the mechanism
   described in this document it is referred as Recovered Tunnel.

1.0 Introduction

   The goal of this draft is to aid the overall resiliency of an L2TP
   endpoint by introducing extensions to RFC 2661 [L2TPv2] and RFC 3931
   [L2TPv3] that will minimize the recovery time of the L2TP layer after
   a failover, while minimizing the impact on its performance. Therefore
   it is assumed that the endpoint's overall architecture is also
   supportive in the resiliency effort.

   To ensure proper operation of an L2TP endpoint after a failover, the
   associated information of the control connection and sessions between
   them must be correct and consistent. This includes both the
   configured and dynamic information. The configured information is
   assumed to be correct and consistent after a failover, otherwise the
   tunnels and sessions would not have been setup in the first place.
   The dynamic information, which is also referred to as stateful
   information, changes with the processing of the tunnel's control and
   data packets. Currently, the only such information that is essential
   to the tunnel's operation is its sequence numbers. For the tunnel
   control channel, the inconsistencies in its sequence numbers can
   result in the termination of the entire tunnel. For tunnel sessions,
   the inconsistency in its sequence numbers, when used, can cause
   significant data loss thus giving perception of "service loss" to the
   end user.

   Thus, an optimal resilient architecture that aims to minimize
   "service loss" after a failover must make provision for the tunnel's
   essential stateful information - i.e. its sequence numbers.
   Currently, there are two options available: the first option is to
   ensure that the backup endpoint is completely synchronized with the
   active with respect to the control and data sessions sequence
   numbers. The other option is to re-establish all the tunnels and its
   sessions after a failover.  The drawback of the first option is that
   it adds significant performance and complexity impact to the
   endpoint's architecture, especially as tunnel and session aggregation
   increases. The drawback of the second option is that it increases the
   "service loss" time, especially as the architecture scales.


Jain, et al.                 Standards Track                    [Page 3]

INTERNET DRAFT                  FAILOVER                      April 2006


   To alleviate the above-mentioned drawbacks of the current options,
   this draft introduces a mechanism to bring the dynamic stateful
   information of a tunnel to correct and consistent state after a
   failure. The proposed mechanism, defines the recovery of tunnels and
   sessions that were in established state prior to the failure.


2.0 Protocol Operation

   The failover protocol consists of three phases - pre failover,
   failover recovery, and session state synchronization.

   Pre failover operation allows an endpoint to specify its failover
   capabilities and timer values, attributes that are used when failover
   occurs.

   Failover recovery is started at the failed endpoint when it initiates
   a new L2TP control connection (called recovery tunnel), for every old
   tunnel that needs recovery. The recovery tunnel serves three
   purposes: 1) It provides a means of authentication and a three-way
   handshake to ensure both ends agree on the failover for a given
   tunnel.  2) It identifies the old tunnel that needs recovery.  3) It
   exchanges the Ns and Nr values to be used in the recovered tunnel on
   both ends. Upon establishing the recovery tunnel two endpoints reset
   their control and/or data channel; after which recovery tunnel could
   be torn down. The sessions that were in established state resume
   traffic.  Data channel recovery is a process of resetting sequence
   numbers when applicable, hence there is no recovery tunnel
   established if there is no control channel failure.

   Session state synchronization process allows two endpoints to agree
   on the state of various sessions in the tunnel after failover. The
   inconsistency could arise due to failure on one of the endpoints.  To
   synchronize, two endpoints first silently clear the sessions that
   were not in established state. At this point they can allow new
   sessions to establish on the recovered tunnel.  Then, they utilize
   two new messages Failed Session Query (FSQ) and Failed Session
   Response (FSR) over the recovered tunnel or over the control channel
   (for data-channel-only failure) to obtain the state of sessions on
   the peer.

   2.1 Pre Failover Operation

      An endpoint that supports the failover protocol defined in this
      document MUST include Failover Capability AVP in SCCRQ or SCCRP
      during control connection establishment.


Jain, et al.                 Standards Track                    [Page 4]

INTERNET DRAFT                  FAILOVER                      April 2006


      Failover Capability AVP
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type 76     |         Reserved          |D|C|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |              Recovery Time (in milliseconds)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      The AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is not
      mandatory (the M-bit MUST be set to 0).

      The C bit, when set indicates an endpoint's capability to initiate
      failover and its ability to respond to a failure on the other
      endpoint by implementing the protocol described in this document.

      The D bit, when set indicates that an endpoint is capable of
      resetting Nr value based on received Ns value(s) from one or more
      'out of order but in sequence' packets from the peer.  This bit is
      applicable only for the sessions using sequence numbers on the
      data channel i.e. data channel failure on the system not
      exhibiting D bit capability could still recover sessions that do
      not use sequence numbers. Section 2.2.2 contain more details on
      data channel reset.

      The Failover Capability AVP MUST set at least one of the two
      capability bits i.e. set C bit and/or D bit.

      Recovery Time, applicable only when C bit is set, is the time in
      milliseconds an endpoint asks its peer to wait before assuming the
      recovery process has failed. This timer starts with when an
      endpoint's control channel timeout ([L2TPv2] section 5.8, [L2TPv3]
      section 4.2) is started, and is not terminated (before expiry)
      until an endpoint successfully authenticate its peer during
      recovery. A value of zero doesn't mean that no failover will
      occur, it means no additional time is requested from the peer.


   2.2 Failover Recovery Procedure

      Failover recovery procedure consists of two steps: 1) Recovery
      tunnel establishment 2) Control and/or data channel reset

      2.2.1 Recovery tunnel establishment

      For control channel failure, failed endpoint establishes a new


Jain, et al.                 Standards Track                    [Page 5]

INTERNET DRAFT                  FAILOVER                      April 2006


      control connection called recovery tunnel for every old tunnel it
      wishes to recover. The purpose of the recovery tunnel is solely to
      recover the corresponding old tunnel. An endpoint SHOULD not send
      any control message on this tunnel, other than those required to
      manage the life of the recovery tunnel. Recovery tunnel MUST also
      not indicate it is failover capable i.e. MUST not include Failover
      Capability AVP in SCCRQ or SCCRP messages. Recovery tunnel MUST
      use the same L2TP version and establishment procedures that were
      used for the control connection being recovered. It MUST follow
      the procedures described in [L2TPv2] or [L2TPv3] to establish the
      recovery tunnel. To identify the old control connection, SCCRQ
      message for recovery tunnel MUST include Tunnel Recovery AVP.

      Tunnel Recovery AVP for L2TPv3 tunnels:

       0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type 77     |           Reserved            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Recover Tunnel Id                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     Recover Remote Tunnel Id                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      Tunnel Recovery AVP for L2TPv2 tunnels:

       0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type 77     |           Reserved            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Reserved              |     Recover Tunnel Id         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Reserved              |   Recover Remote Tunnel Id    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      This AVP MUST not be hidden (the H-bit is set to 0). The AVP is
      mandatory (the M-bit is set to 1).

      Recover Tunnel Id encodes the local tunnel id that it wants
      recovered.  Similarly, Recover Remote Tunnel Id encodes the remote


Jain, et al.                 Standards Track                    [Page 6]

INTERNET DRAFT                  FAILOVER                      April 2006


      tunnel id corresponding to the old tunnel.

      Upon getting an SCCRQ with Tunnel Recovery AVP, the peer endpoint
      validates Recover Tunnel Id and Recover Remote Tunnel Id and
      responds with an SCCRP. It MUST terminate the recovery tunnel if:
      - Recover Tunnel Id or Remote Recover Tunnel Id is unknown.
      - Failed or non failed endpoint did not indicate it was failover
      capable.
      - The L2TP version of recovery tunnel is different from the
      version used in the old tunnel.

      If non failed endpoint accepts the SCCRQ, it MAY include Suggested
      Control Sequence AVP in the SCCRP.

      Suggested Control Sequence AVP

       0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type 78     |            Reserved           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |        Suggested Ns           |         Suggested Nr          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      This AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is not
      mandatory (the M-bit is set to 0).

      This is an optional AVP, suggesting Ns and Nr values to be used by
      the failed endpoint. If this AVP is present in an SCCRP message,
      the failed endpoint MUST set the Ns and Nr values of the recovered
      tunnel to the respective suggested values. When this AVP is not
      sent in SCCRP or not present in an incoming SCCRP, the Ns and Nr
      values for the recovered tunnel are set to zero. It is recommended
      that the non failed endpoint suggest the Ns and Nr values to help
      avoid the interference in recovered tunnel's control channel with
      old control packets.

      In case of L2TPv3, Recovery tunnel MUST use the Control Message
      authentication (i.e. exchange the nonce values) as described in
      [L2TPv3] section 4.3, if the old tunnel was configured to do
      Control Message authentication. An L2TP Version 3 recovered tunnel
      MUST reset their nonce values (local and remote) to the nonce
      values exchanged in the recovery tunnel.

      To authenticate an endpoint during recovery, an endpoint MUST
      follow the procedure described in either [L2TPv2] section 5.1.1 or


Jain, et al.                 Standards Track                    [Page 7]

INTERNET DRAFT                  FAILOVER                      April 2006


      [L2TPv3] section 4.3. It SHOULD use the same secret that was used
      to authenticate the old tunnel. Not being able to authenticate
      could be a reason to terminate the recovery tunnel. If, for any
      reason, the failed endpoint could not establish the recovery
      tunnel then it MUST silently clear the old tunnel and sessions
      within, assuming the recovery process has failed.

      Any control packet received on the recovered tunnel, before
      control channel reset, MUST be silently discarded.

      An endpoint MUST use Tie Breaker AVP (section 4.4.3 [L2TPv2]) or
      Control Connection Tie Breaker AVP (section 5.4.3 [L2TPv3]) in the
      setup of the recovery tunnel to ensure that only a single recovery
      tunnel (when both endpoints failover) is established for each
      tunnel to be recovered. The scope of tie breaker AVP's action,
      when used in a recovery tunnel, is restricted to the recovery
      tunnel(s) for a single tunnel to be recovered as opposed to the
      non-recovery usage where the scope is the LAC-LNS pair. Thus an
      implementation MUST apply the tiebreaker only to those tunnels
      that are a) recovery tunnels, and b) associated with the same
      tunnel to be recovered. It must not impact the operation of non-
      recovery tunnels nor or of recovery tunnels associated with
      different tunnels to be recovered. The tunnel that wins the tie is
      used to decide the suggested Ns, Nr values on the recovered
      tunnel. Therefore, the endpoint that looses the tie, should reset
      the Ns and Nr values as if it were a non failed endpoint (section
      2.2.2). Appendix C illustrates double failover scenario.

      2.2.2 Control and/or Data Channel Reset

      Control channel reset procedure SHOULD flush the transmit and
      receive windows, and reset the control channel sequence numbers
      (i.e. Ns and Nr values) on recovered tunnel. The control channel
      on failed endpoint is reset upon getting a valid SCCRP, whereas
      control channel on non failed endpoint is reset upon getting a
      valid SCCCN. If failed endpoint does not receive Suggested
      sequence number AVP in SCCRP then it MUST reset Ns and Nr values
      to zero. Similarly, if non failed endpoint opts not to send
      suggested sequence number AVP then it MUST reset Ns and Nr values
      to zero. Either endpoint can tear down the recovery tunnel after
      control channel reset.

      For control channel failure an endpoint MUST prevent establishment
      of new sessions until it has cleared (or marked for clearance) the
      sessions that were not in established state i.e. until after Step
      1, section 2.3 is complete.

      Data channel is reset only for the sessions using sequence


Jain, et al.                 Standards Track                    [Page 8]

INTERNET DRAFT                  FAILOVER                      April 2006


      numbers.  For L2TPv3 data channel, terms Nr and Ns are used to
      mean 'expected sequence number' and 'sequence number'
      respectively. Data channel reset requires the failed endpoint to
      set the Ns value to zero, where as non failed endpoint continues
      to use the Ns values it was using previously. To reset Nr values
      during failover, if an endpoint receives 'n' out of order but in
      sequence packets then it MUST set the Nr value based on the Ns
      value of the incoming packets, as suggested in Appendix C
      [L2TPv3]. The value of 'n' SHOULD be configurable.

      For sessions requiring data channel reset, if one of the endpoints
      doesn't exhibit the capability (indicated in 'D' bit in Failover
      Capability AVP) to reset the Nr value, then data channel using
      sequence numbers can't be recovered. Such sessions SHOULD be torn
      down by the failed endpoint by sending a CDN. For data-channel-
      only failure, two endpoints MAY use FSQ/FSR messages on the
      control channel synchronize the state of sessions as described in
      section 2.3 below.

   2.3 Session State Synchronization

      If control channel failover happens while a session is being
      established or being torn down, it is possible for an endpoint to
      consider a session in established state, when its peer considers
      the same session non existent. Two such situations occur when an
      endpoint fails after sending:
      - A CDN message that never made it to the peer.
      - An ICCN message that never made it to the peer.

      On other hand, a data channel failure could result into sessions
      not being in recoverable state.

      Following mechanism MUST be used to identify and clear the
      sessions that exists on an endpoint but not on its peer:

      Step1: For control channel failure, after the recovery tunnel is
      established, the sessions that were not in established state MUST
      be silently cleared (i.e. without sending a CDN message) by each
      endpoint.

      Step2: Both endpoints MAY identify the sessions that might have
      been in inconsistent states, perhaps based on data channel
      inactivity. FSQ and FSR messages have been introduced to
      synchronize session state at any given point during the life of a
      session between two endpoints.  These messages are used when one
      endpoint determines or suspects in an implementation specific
      manner that a session state between it and its peer is in
      inconsistent state.


Jain, et al.                 Standards Track                    [Page 9]

INTERNET DRAFT                  FAILOVER                      April 2006


      Step3: An endpoint sends Failover Session Query (FSQ) message,
      message type 21, to query the state of stale sessions on its peer.
      An FSQ message MUST include at least one Failover Session State
      (FSS) AVPs.  An endpoint MAY send another FSQ message before
      getting response for its previous FSQs.

      Failover Session State AVP is described as follows:

      Failover Session State AVP for L2TPv3 sessions (FSQ, FSR):

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |      Vendor Id [IETF]         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      Attribute Type 79        |         Reserved              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          Session Id                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Remote Session Id                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      Failover Session State AVP for L2TPv2 sessions (FSQ, FSR):

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |      Vendor Id [IETF]         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      Attribute Type 79        |         Reserved              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Reserved           |        Session Id             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Reserved           |      Remote Session Id        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      This AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is
      mandatory (the M-bit is set to 1).

      Session Id identifies the local session id sender had assigned,
      for which it would like to query the state on its peer.  Remote
      Session Id is the remote session id for the same session.

      Before all sessions are synchronized using FSQ/FSR mechanism, if
      an endpoint receives an ICRQ for a session it believe is already


Jain, et al.                 Standards Track                   [Page 10]

INTERNET DRAFT                  FAILOVER                      April 2006


      in established state, it MUST respond to such ICRQ with a CDN,
      setting Assigned/Local Session ID AVP ([L2TPv2] section 4.4.4,
      [L2TPv3] section 5.4.4) to its local session id, and clear the
      session that it considered established. An endpoint could assign
      least recently used session ids to avoid this situation.

      When an endpoint receives an FSQ message, it MUST ensure that for
      each FSS AVP in FSQ message it includes an FSS AVP in Failover
      Session Response (FSR) message, message type 22. There is no one-
      to-one correspondence between FSQ message and FSR message.
      Therefore an endpoint could respond to multiple FSQs using one FSR
      message, or it could respond one FSQ with multiple FSRs.  For each
      FSS AVP received in FSQ, an endpoint MUST validate the Remote
      Session Id and determine if it is paired with the Session Id
      specified in the message. If FSS AVP is not valid (i.e. session is
      non-existing or it is paired with different remote session id),
      then the Session Id field in FSS AVP in the response MUST be set
      to zero. When session is discovered to be pairing with mismatching
      session id, the local session MUST not be cleared, but rather
      marked stale, to be queried later using another FSQ message.  An
      example dialogue in Appendix D elaborates the endpoints behavior
      on mismatching session ids.

      Also, when responding to FSQ with an FSR message, Remote Session
      Id in FSS AVP is always set to the received value of Session ID in
      FSS AVP in FSQ message.

      When an endpoint receives an FSR message, it MUST use the Remote
      Session Id field to identify the local session and silently
      (without sending a CDN) clear the session if Session Id in the AVP
      was zero.  Otherwise it can consider the session to be in
      established state and recovered.

      FSQ and FSR messages MUST include 'Message Type AVP' and 'FSS
      AVP'.  They MAY include 'Random Vector AVP' and for L2TPv3
      'Message digest AVP'. Other AVPs MUST NOT be sent and SHOULD be
      ignored on receipt.

      FSS AVP MUST NOT be used in any message other than FSQ and FSR
      messages.

3.0 IANA Considerations

   This document defines following values assigned by IANA

         - Two new Message Type (Attribute Type 0) Values:
            Failover Session Query      : 21
            Failover Session Response   : 22


Jain, et al.                 Standards Track                   [Page 11]

INTERNET DRAFT                  FAILOVER                      April 2006


         - Four new control message Attribute Value Pairs:
            Failover Capability         : 76
            Tunnel Recovery             : 77
            Suggested Control Sequence  : 78
            Failover Session State      : 79

4.0 Security Considerations

   The failover mechanism described here leaves a room (1 in 2^16 for
   L2TPv2 and 1 in 2^32 for L2TPv3) for an intruder to discover the old
   tunnel id, which could be misused to fake the failover to result into
   a shutdown of an existing tunnel. To avoid this, control channel
   authentication described in section 2.2.1 is should be used. L2TPv3
   control connections could also use 'Digest AVP' to make it secure.
   Protecting L2TP with IPSec would also help secure the control
   connections for failover situations.

5.0 Acknowledgements

   Leo Huber provided suggestions to help define the failover concept.
   Mark Townsley reviewed the document and provided valuable
   suggestions.

6.0 Author Information

      Vipin Jain
      Riverstone Networks
      5200 Great America Parkway
      Santa Clara, CA 95054
      Email: vipinietf@yahoo.com

      Paul W. Howard
      Juniper Networks
      10 Technology Park Drive
      Westford, MA 01886
      Email: phoward@juniper.net

      Sam Henderson
      Cisco Systems
      7025 Kit Creek Rd.
      PO Box 14987
      Research Triangle Park, NC 27709
      Email: samh@cisco.com

      Keyur Parikh
      Harris Broadcast Communication
      4393 Digitalway
      Mason, OH 45040


Jain, et al.                 Standards Track                   [Page 12]

INTERNET DRAFT                  FAILOVER                      April 2006


      Email: kparikh@harris.com


7.0 References

   [L2TPv2] Townsley, et. al., "Layer Two Tunneling Protocol 'L2TP'",
            RFC2661

   [L2TPv3] Lau, Townsley, Goyret,
            "Layer Two Tunneling Protocol - version 3 'L2TPv3'", RFC3931

8.0 Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.

9.0 Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

10.0 Copyright Statement

   Copyright (C) The Internet Society (2005).


Jain, et al.                 Standards Track                   [Page 13]

INTERNET DRAFT                  FAILOVER                      April 2006


   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.


Appendix A

This section describes some design considerations that came up during
discussions when developing the proposal:

   A.1  Backward compatibility and extensibility

      -  The mechanism should be backward compatible; i.e. it should not
      redefine existing behavior of [L2TPv2] and [L2TPv3] compliant
      systems.

      - The protocol should allow a peer to detect failover capabilities
      in advance, for it to fall back to other failover mechanisms
      should peer does not support proposed failover protocol.

      - The protocol should allow future extensions to fail-over
      mechanism at ease.


   A.2  Less failover recovery time

   The mechanism should have least possible time to recover from
   failover (target of 3-5 seconds for 30k tunnels). Specifically it
   should take following into consideration:

      - Faster recovery: by utilizing less number of messages exchanged
      to recover from failover

      - CPU intensiveness: less cpu intensive a proposal is, better are
      the chances of faster recovery

      - Parallel establishment of various tunnels: by keeping different
      tunnel reestablishments independent of one another.

   A.3  Less Payload data loss

   The mechanism should have least possible impact on data flows for
   sessions with sequencing enabled.

   A.4  Minimum interference with pre-failure control traffic

   The mechanism should define a way of clearly distinguishing the
   messages that were sent before failover from that which are sent


Jain, et al.                 Standards Track                   [Page 14]

INTERNET DRAFT                  FAILOVER                      April 2006


   after.  Specifically, it should define a mechanism that avoid
   confusion between sequence numbers that were used before and after if
   the same Tunnel Id is used.

   A.5  Simplicity

   Simpler the protocol is, better are the changes of being adopted by
   everybody. Following would help achieve this:

      - Use of existing AVPs, messages and packet formats.

      - Avoid introducing special considerations and mechanisms a new
      implementation would have to deal with.

      - Simpler post fail-over synchronization mechanism.


   A.6  Security

   The mechanism should provide a mechanism to authenticate peers when
   resynchronization is happening after a failover.


   A.7 Scalability

   It is very important for a proposed protocol to work well for a
   scalable deployment. This includes dealing with all design
   considerations discussed above for scalable deployments, having
   thousands of tunnels or sessions or mix of the two.

   A target of 30,000 tunnels carrying 150,000 to 200,000 sessions from
   300 peers was considered during the design.


Appendix B

   Description below outlines the failover protocol operation for an
   example tunnel. The failover protocol does not preclude an endpoint
   from recovering multiple tunnels in parallel. It also allows an
   endpoint to send multiple FSQs, each including multiple FSS AVPs, to
   recover quickly.

   Pre Failover Exchange (section 2.1):

   Endpoint                                             Peer
                (assigned tid = x, failover capable)
   SCCRQ       -------------------------------------->  validate SCCRQ


Jain, et al.                 Standards Track                   [Page 15]

INTERNET DRAFT                  FAILOVER                      April 2006


                (assigned tid = y, failover capable)
   validate    <--------------------------------------  send SCCRP
   SCCRP, etc.

   .... <after tunnel gets created, sessions are established> ....


   < This Node fails >

   Failed endpoint establishes recovery tunnel (section 2.2.1).
   Initiate recovery tunnel establishment for the old tunnel 'x':

   Failed Endpoint                                      Peer

             (assigned tid = z, Recovery AVP)
   SCCRQ     ----------------------------------->  Detects failover
           (recover tid = x, recover remote tid = y)  validate SCCRQ


           (Suggested Control Sequence AVP, Suggested Ns/Nr = 3/100)
   validate <-----------------------------------   send SCCRP
   SCCRP    (recover tid = y, recover remote tid = x)
   reset Ns = 3, Nr = 100
   on the recovered tunnel

   SCCCN     ----------------------------------->  validate and reset
                                                   Ns = 100, Nr = 3 on
                                                   the recovered tunnel


   Terminate the recovery tunnel

   tid = 'z'
   StopCCN  --------------------------------------> Cleanup 'w'


   Session states are synchronized both endpoints may send FSQs and
   cleanup stale sessions (section 2.3)

              (FSS AVP for sessions s1, s2, s3..)
   send FSQ  -------------------------------------> compute the state
                                                    of sessions in FSQ

              (FSS AVP for sessions s1, s2, s3...)
   deletes  <-------------------------------------- send FSR
   stale sessions, if any


Jain, et al.                 Standards Track                   [Page 16]

INTERNET DRAFT                  FAILOVER                      April 2006


              (FSS AVP for sessions s7, s8, s9...)
   compute  <-------------------------------------- send FSQ
   the sate of
   sessions in FSQ


              (FSS AVP for sessions s7, s8, s9...)
   send FSR --------------------------------------> delete stale
                                                    sessions, if any


Appendix C

   This section shows an example dialogue to illustrate double failure
   recovery. The notable difference, as described in section 2.2.1, in
   the procedure from single failover scenario is the use of tie breaker
   by one of the failed endpoints to use the recovery tunnel established
   by its peer (also a failed endpoint) as recovery tunnel.

      Failed endpoint                      Failed endpoint

      (assume old tid = A)                 (assume old tid = B)

                  Recovery AVP = (A, B)
      SCCRQ     -----------------------+
      (with tie  (recovery tunnel 'C') |
       breaker                         |
       AVP)                            |
                 Recovery AVP = (B, A) |
   +- valid    <--------------------------- Send SCCRQ
   |  SCCRQ      (recovery tunnel 'D') |    (with tie breaker AVP)
   |  This endpoint                    |
   |  loses tie;                       |
   |  Discards tunnel 'C'              +--> Valid SCCRQ
   |                                        This endpoint wins tie;
   |                                        Discards SCCRQ
   |
   |              (may include SCS AVP)
   +->Send SCCRP -------------------------> Validate SCCRP
                                            Reset 'B';
                                            Set Ns, Nr values --+
                                                                |
                                                                |
                                                                |
      Validate SCCN <---------------------- Send SCCN    -------+
      Reset 'A';
      Set Ns, Nr values


Jain, et al.                 Standards Track                   [Page 17]

INTERNET DRAFT                  FAILOVER                      April 2006


   FSQs and FSRs for the old tunnel (A, B) are exchanged on
   the recovered tunnel by both endpoints.


Appendix D

   Session id mismatch could not be a result of failure on one of the
   endpoints. However, failover session recovery procedure could
   exacerbate the situation, resulting into a permanent mismatch in
   session ids between two endpoints. Dialogue below outlines the
   behavior described in section 2.3 to handle such situations
   gracefully.

   Failed endpoint                      Non failed endpoint

   (assume a mismatch)                  (assume a mismatch)
   Sid = A, Remote Sid = B              Sid = B, Remote Sid = C
   Sid = C, Remote Sid = D


                  FSS AVP (A, B)
   send FSQ  -------------------------> No (B, A) pair exist;
                                        rather (B, C) exist.
                                        If it clears B then peer doesn't
                                        know if C is stale on other end.

                                        Instead if it marks B stale
                                        and queries the session state
                                        via FSQ, C would be cleared on
                                        the other end.

                  FSS AVP (0, A)
   Clears A <-------------------------- send FSR

                                        ... some time later ...

                  FSS AVP (B, C)
   No (B,C) <-------------------------- send FSQ
   Mark C Stale

                  FSS AVP (B, 0)
   Send FSR --------------------------> Clears B


Jain, et al.                 Standards Track                   [Page 18]