Network Working Group                                     Reinaldo Penno
Internet-Draft                                           Nortel Networks
Expires Feb 2004                                            Keyur Parikh
                                                         Megisto Systems
                                                                  Ly Loi
                                                          Tahoe Networks
                                                               Leo Huber
                                                        Extreme Networks
                                                              Vipin Jain
                                                     Riverstone Networks
                                                           Mark Townsley
                                                           Cisco Systems
                                                             August 2003


                Fail Over extensions for L2TP 'failover'
                    draft-ietf-l2tpext-failover-02.txt

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026 except that the right to produce derivative
   works is not granted.

   This document is an Internet-Draft and is NOT offered in accordance
   with Section 10 of RFC2026, and the author does not provide the IETF
   with any rights other than to publish as an Internet-Draft


   The distribution of this memo is unlimited. Please send comments to
   the L2TP mailing list (l2tpext@ietf.org).

Copyright Notice

   Copyright (C) The Internet Society (2003).  All Rights Reserved.


Abstract

   L2TP is a connection-oriented protocol that has shared state between
   active endpoints. Some of this shared state is vital for operation
   but may be rather volatile in nature, such as packet sequence numbers
   used on the L2TP Control Connection. When failure of one side of a
   control connection occurs, a new control connection is created and
   associated with the old connection by exchanging information about
   the old connection. Such a mechanism is not intended as a replacement


Jain, et al.                expires Feb 2004                    [Page 1]

INTERNET DRAFT                  FAILOVER                     August 2003


   for an active fail over with some mirrored connection states, but as
   an aid just for those parameters that are particularly difficult to
   have immediately available. Protocol extensions to L2TP defined in
   this document are intended to facilitate state recovery, providing
   additional resiliency in an L2TP network and improving a remote
   system's layer 2 connectivity.

   Terminology

      Endpoint: An L2TP control connection endpoint, either LAC or LNS.

      Active Endpoint: An endpoint that is currently providing service.

      Backup Endpoint: A redundant endpoint standing by for the active
      endpoint.

      Failover: The action of a Backup Endpoint taking over the service
      of an Active Endpoint. This could be due to administrative action
      or failure of the Active Endpoint.


1. Introduction

   The goal of this draft is to aid the overall resiliency of an L2TP
   endpoint by introducing extensions to RFC 2661 [L2TP] that will
   minimize the recovery time of the L2TP layer after a failover, while
   minimizing the impact on its performance. Therefore it is assumed
   that the endpoint's overall architecture is also supportive in the
   resiliency effort.

   To ensure proper operation of a L2TP endpoint after a failover, the
   associated information of the tunnels and sessions between them must
   be correct and consistent. This includes both the configured and
   dynamic information. The configured information is assumed to be
   correct and consistent after a failover, otherwise the tunnels and
   sessions would not have been setup in the first place. The dynamic
   information, which is also referred to as stateful information,
   changes with the processing of tunnel's control and data packets.
   Currently, the only such information that is essential to the
   tunnel's operation is its sequence numbers. For the tunnel control
   channel, the inconsistencies in its sequence numbers can result in
   the termination of the entire tunnel. For tunnel sessions, the
   inconsistency in its sequence, when used, can cause significant data
   loss thus giving perception of "service loss" to the end user.

   Thus, an optimal resilient architecture that aims to minimize
   "service loss" after a failover must make provision for the tunnel's
   essential stateful information - i.e. its sequence numbers.


Jain, et al.                expires Feb 2004                    [Page 2]

INTERNET DRAFT                  FAILOVER                     August 2003


   Currently, there are two options available: the first option is to
   ensure that the backup endpoint is in complete sync with the active
   with respect to the control and data sessions sequence numbers. The
   other option is to simply re-establish all the tunnels and its
   sessions after a failover.  The drawback of the first option is that
   it adds significant performance and complexity impact to the
   endpoint's architecture, especially as tunnel and session aggregation
   increases. The drawback of the second option is that it increases the
   "service loss"  time, especially as the architecture scales.

   To alleviate the above-mentioned drawbacks of the current options,
   this draft introduces a mechanism to bring the dynamic stateful
   information of a tunnel to correct and consistent state after a
   failure. Proposed mechanism, currently, defines the recovery of
   tunnels and sessions that were in established state prior to the
   failure.

2.0 Protocol Operation

   The failover protocol allows an endpoint to specify its failover
   capabilities during tunnel establishment. Based on failover
   capabilities, two endpoints learn if a tunnel and its sessions
   support recovery. Upon failure, a new tunnel is initiated for every
   old tunnel that needs recovery. The new tunnel includes the Old
   Tunnel ID AVP, a new AVP defined in Section 4.3, which identifies the
   old tunnel. Upon getting this AVP, an endpoint learns that its peer
   has failed over and would like to recover the identified tunnel.
   After the new tunnel is established, all active sessions and tunnel
   characteristics of the previous tunnel are moved to the new tunnel.
   Normal tunnel activity is resumed then.


   2.1 Tunnel Establishment

      Regular Tunnel establishment procedures are same as defined by
      [L2TP], except when establishing a tunnel, endpoints exchange
      their failover capabilities using Failover Capability Initiate AVP
      and Failover Capability Response AVP in SCCRQ and SCCRP control
      messages.

   2.2 Session Establishment

      The session establishment and termination procedures described in
      [L2TP] remains same.

   2.3 Post Failure Operation

      This section describes the behavior of the failed endpoint and its


Jain, et al.                expires Feb 2004                    [Page 3]

INTERNET DRAFT                  FAILOVER                     August 2003


      peer during recovery. Endpoints MUST avoid sending any control
      packets over the old tunnel that is being recovered. Tunnels that
      successfully negotiated failover capabilities may use the failover
      protocol.

      SCCRQ, SCCRP and SCCCN messages SHOULD use the same set and value
      of AVPs, except Assigned Tunnel ID AVP, that were used in the old
      tunnel in order to keep tunnel characteristics same.

      2.3.1 Failed Endpoint's Behavior

        It establishes a new tunnel as specified in [L2TP] with
        following considerations:

        - It MUST include the Old Tunnel Id AVP and Old Local Tunnel Id
        AVP, defined in section 4.3 and 4.4 respectively, in the new
        SCCRQ.

        - For any reason, if the new tunnel could not be established,
        the endpoint MUST assume that recovery on that tunnel has failed
        and it SHOULD clear the tunnel and sessions within that tunnel
        on its end.

      2.3.2 Failed Endpoints' Peer's behavior

        It accepts tunnel requests from the peer as specified in [L2TP]
        with following considerations:

        - It MUST use the Old Tunnel Id AVP, defined in section 4.3, to
        determine the tunnel peer is trying to recover. If this AVP is
        not present then the endpoint can assume it to be a new tunnel.

        - It MUST validate Old Tunnel Id and Old Local Tunnel Id in an
        incoming SCCRQ. If it does not find a match it MUST reject the
        SCCRQ.

        - It may reject the new tunnel request if it did not advertise
        failover capabilities on the corresponding old tunnel.

      2.3.3 Preserving Sessions

        Upon establishment of the new tunnel, both endpoints

        - MUST consider the sessions that were in established state in
        the old tunnel to be now belonging to the new tunnel.

        - MUST entertain various control messages, for example new
        session establishment request via ICRQ, as defined in [L2TP].


Jain, et al.                expires Feb 2004                    [Page 4]

INTERNET DRAFT                  FAILOVER                     August 2003


      2.3.4 Session State Inconsistency Between Peers

        The failover mechanism allows the two ends of a tunnel to
        preserve the sessions that were in the established state.
        However, it is very important for two endpoints to agree upon
        the sessions that they are going to preserve in a tunnel. If
        failover happens while a session is being established or being
        torn down, it is possible that one of the endpoints consider the
        session in the established state while it's peer consider the
        same session to be down or non existent. For example, when an
        endpoint fails after sending a CDN message that never made to
        the peer. Or when an endpoint fails after sending ICCN message
        that never made to to the peer. To facilitate synchronization of
        the sessions under such circumstances, following mechanism is
        proposed:

        - After the new tunnel is established, the sessions that were
        not in established state are brought down locally without
        sending a CDN message to the peer.

        - A peer could explicitly ask for the session(s) that it
        consider (perhaps based on inactivity, etc.) might not exist on
        the peer.  It does so by sending a Failover Session Query (FSQ)
        Message, including one Failed Session State (FSS) AVP for each
        session that might have ceased to exist on the peer.

        - Upon receipt of an FSQ message, peer responds with Failover
        Session Response (FSR) Message that contains one FSS AVP for
        each FSS AVP received in FSQ message, stating whether the
        endpoint considers it in the established state.

        - Upon getting FSR Message for from peer an endpoint brings down
        the session that peer considers not in the established state
        without sending a CDN.

        - For security purposes, FSQ Message SHOULD be entertained only
        within a configured period after the failover. FSQ and FSR
        Messages could also include Challenge and Challenge Response
        AVPs as defined by [L2TP] to validate the peer's identity.

        The Session State Inconsistency mechanism SHOULD be carried out
        only for sessions that are not getting any traffic, and
        therefore are considered to have possible incosistent state,
        upon failover.

      2.3.4. Data Plane Behavior

        If sequencing was used on data sessions, then, upon detecting


Jain, et al.                expires Feb 2004                    [Page 5]

INTERNET DRAFT                  FAILOVER                     August 2003


        peer's failure, the non-failed endpoint MUST set the next
        expected Ns based on the incoming Ns value. It must also flush
        re-ordering buffers if applicable.


3.0. Failover Messages

   Failover draft defines two new messages to help bring the state of
   the sessions to an agreeable state. These message could be sent only
   on the new tunnel that was established for recovery.


   3.1. Failover Session Query Message (FSQ)

      Failover Session Query Message (FSQ), Message Type TBD, is sent by
      an endpoint after failover to learn the state of a session on the
      peer. There are one ore more Failover Session State (FSS) AVP(s),
      defined in section 4.5, present in this message. This message MAY
      include the Challenge AVP to validate the identity of the
      originator, only if the tunnel authentication was done when the
      old tunnel was established.

   3.2 Failover Session Response Message (FSR)

      Failover Session Response Message (FSR), Message Type TBD, is sent
      by a node in response to an FSQ Message to let peer learn about
      the state of a session on the endpoint. For every FSS AVP in FSQ
      message there is one FSS AVP present in FSR message. FSR message
      MUST include a Challenge Response AVP if FSQ was received with a
      Challenge AVP.


4.0. Failover AVPs

   The new AVPs that should be included in SCCRQ, SCCRP messages are as
   follows:

   4.1. Failover Initiate Capability AVP [SCCRQ, SCCRP]

      Failover Capability Initiate AVP, Attribute Type [TBD], describes
      if an endpoint could initiate recovery on a given tunnel after
      failure.


Jain, et al.                expires Feb 2004                    [Page 6]

INTERNET DRAFT                  FAILOVER                     August 2003


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type [TBD]  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      The AVP is not mandatory (the M-bit MUST be set to 0) The AVP MAY
      be hidden (the H-bit set to 0 or 1).


   4.2. Failover Response Capability AVP [SCCRQ, SCCRP]

      Failover Capability Response AVP, Attribute Type [TBD], describes
      if an endpoint is capable of responding to failure on a given
      tunnel.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type [TBD]  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      The AVP is not mandatory (the M-bit MUST be set to 0) The AVP MAY
      be hidden (the H-bit set to 0 or 1).

   4.3. Old Tunnel ID AVP [SCCRQ, SCCRP]

      The Old Tunnel ID AVP, Attribute Type [TBD], indicates the Tunnel
      ID in SCCRQ and SCCRP messages that was assigned by the receiver
      before failure.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type [TBD]  |         Old Tunnel Id         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      This AVP is mandatory (the M-bit MUST be set to 1). The AVP may be
      hidden (the H-bit set to 0 or 1).


Jain, et al.                expires Feb 2004                    [Page 7]

INTERNET DRAFT                  FAILOVER                     August 2003


   4.4. Old Local Tunnel ID AVP [SCCRQ, SCCRP]

      The Old Local Tunnel ID AVP, Attribute Type [TBD], indicates the
      Tunnel Id that was assigned by the sender in SCCRQ or SCCRP
      messages before failure.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Attribute Type [TBD]  |     Old Local Tunnel Id       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      The AVP is mandatory (the M-bit MUST be set to 1) The AVP may be
      hidden (the H-bit set to 0 or 1).


   4.5. Failover Session State AVP [FSQ, FSR]

      The Failover Session State (FSS) AVP, Attribute Type [TBD], serves
      different purposes in FSQ and FSR messages.

      In FSQ Message it indicates the Assigned Session Id it received
      from the peer when the session was originally established, i.e.
      the old remote session id. It SHOULD be used only for the sessions
      that an endpoint considers might not exist on the peer but would
      like to query the peer regarding the same.

      In FSR Message it indicates the state of the session that were
      queried in the FSQ message. An endpoint SHOULD include one FSS AVP
      in FSR message for every FSS AVP in FSQ message.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |M|H| rsvd  |      Length       |           Vendor Id [IETF]    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      Attribute Type [TBD]     |       Assigned Session Id     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Assigned Session State    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Assigned Session State in this AVP the value of 1 if session is
      considered to be in the established state by an endpoint. A value
      of 0 is used otherwise. Other values are reserved that should not
      be set while sending and be ignored upon receipt.


Jain, et al.                expires Feb 2004                    [Page 8]

INTERNET DRAFT                  FAILOVER                     August 2003


      The AVP is mandatory (the M-bit MUST be set to 1) The AVP may be
      hidden (the H-bit set to 0 or 1).

5.0 L2TP Over IPsec Considerations during failover

   During failover, The tunnel initiator and responder of the new tunnel
   MUST use the same source IP address and port as the one used for the
   failed tunnel. This will prevent any mismatches in the existing IPsec
   filter/policy database at both ends. If this cannot be preserved,
   then the procedure laid out in [L2TP IPsec] MUST be utilized to cover
   all cases of dynamic assignments of IP address and ports.
6. IANA Considerations

   This document requires three new "AVP Attributes" to be assigned
    through IETF Consensus [RFC2434] as indicated in Section 10.1 of
   [RFC2661]. These are:

         Failover Initiate Capability AVP (section 4.1)

         Failover Response Capability AVP (section 4.2)

         Old Tunnel ID AVP (section 4.3)

         Old Local Tunnel ID AVP (section 4.4)

         Failover Session State AVP (section 4.5)

         Failover Session Query Message (FSQ) (section 3.1)

         Failover Session Response Message (FSR) (section 3.2)

7. Security Considerations

   The failover mechanism described here leaves a small (1 in 2^32) room
   for an intruder to discover the old tunnel id of an existing tunnel
   by trying out various possibilities in Old Tunnel Id and Old Local
   Tunnel Id AVP.

   It also introduces an opportunity for an intruder to spoof the FSQ
   message to know the active sessions on a node. This could be avoided
   by using Challenge Request AVP and Challenge Response AVP in FSQ and
   FRQ messages. The time window during which FSQ messages are accepted
   (after failover) could be configured to a smaller value to reduce the
   vulnerability.


Jain, et al.                expires Feb 2004                    [Page 9]

INTERNET DRAFT                  FAILOVER                     August 2003


8. Author's Addresses


      Reinaldo Penno
      Nortel Networks
      2305 Mission College Blvd
      Santa Clara, CA 95054
      Phone: +1 408.565.3023
      Email: rpenno@nortelnetworks.com

      Keyur Parikh
      Megisto Systems
      20251 Century Boulevard, Suite 120
      Germantown, MD 20876
      Phone: +1 301.444.1723
      Email: kparikh@megisto.com

      Leo Huber
      Extreme Networks
      3585 Monroe St.
      Santa Clara CA 95051
      Phone: +1 408.597.3037
      Email: lhuber@extremenetworks.com

      Ly Loi
      Tahoe Networks
      3052 Orchard Drive
      San Jose, CA 95134
      Phone: +1 408.944.8630
      Email: lll@tahoenetworks.com

      Vipin Jain
      Riverstone Networks
      5200 Great America Parkway
      Santa Clara, CA 95054
      Phone: +1 408.878.0464
      Email: vipinietf@yahoo.com

      W. Mark Townsley
      Cisco Systems
      7025 Kit Creek Road
      PO Box 14987
      Research Triangle Park, NC 27709
      EMail: townsley@cisco.com


Jain, et al.                expires Feb 2004                   [Page 10]

INTERNET DRAFT                  FAILOVER                     August 2003


8. References


   [L2TP] Townsley, et. al., "Layer Two Tunneling Protocol L2TP", RFC2661

   [L2TP IPsec] Patel, et. al., "Securing L2TP using IPsec", RFC3139


Appendix A

This section describes some design considerations that came up during
discussions when developing the proposal:

   A.1  Backward compatibility and extensibility

      -  The mechanism should be backwards compatible; i.e. it should
      not redefine existing behavior of [L2TP] compliant systems.

      - The protocol should allow a peer to detect failover capabilities
      in advance, for it to fall back to other failover mechanisms
      should peer does not support proposed failover protocol.

      - The protocol should allow future extensions to fail-over
      mechanism at ease.


   A.2  Less failover recovery time

   The mechanism should have least possible time to recover from
   failover (target of 3-5 seconds for 30k tunnels). Specifically it
   should take following into consideration:

      - Faster recovery: by utilizing less number of messages exchanged
      to recover from failover

      - CPU intensiveness: less cpu intensive a proposal is, better are
      the changes of faster recovery

      - Parallel establishment of various tunnels: by keeping different
      tunnel reestablishments independent of one another.

   A.3  Less Payload data loss

   The mechanism should have least possible impact on data flows for
   sessions with sequencing enabled.


Jain, et al.                expires Feb 2004                   [Page 11]

INTERNET DRAFT                  FAILOVER                     August 2003


   A.4  Minimum interference with pre-failure control traffic

   The mechanism should define a way of clearly distinguishing the
   messages that were sent before failover from that which are sent
   after.  Specifically, it should define a mechanism that avoid
   confusion between sequence numbers that were used before and after if
   the same Tunnel Id is used.

   A.5  Simplicity

   Simpler the protocol is, better are the changes of being adopted by
   everybody. Following would help achieve this:

      - Use of existing AVPs, messages and packet formats.

      - Avoid introducing special considerations and mechanisms a new
      implementation would have to deal with.

      - Simpler post fail-over synchronization mechanism.


   A.6  Security

   The mechanism should provide a mechanism to authenticate peers when
   resynchronization is happening after a failover.


   A.7 Scalability

   It is very important for a proposed protocol to work well for a
   scalable deployment. This includes dealing with all design
   considerations discussed above for scalable deployments, having
   thousands of tunnels or sessions or mix of the two.

   A target of 30,000 tunnels carrying 150,000 to 200,000 sessions from
   300 peers was considered during the design.


Appendix B

   Figure below outlines the the failover protocol operation for an
   example tunnel. The failover protocol does not preclude an endpoint
   from recovering multiple tunnels in parallel. It also does not
   preclude an endpoint from sending multiple FSQs to recover quickly.

      Before Failure:

      Endpoint                                             Peer
                   (assigned tid = x, failover capable)
      SCCRQ       -------------------------------------->  validates SCCRQ


Jain, et al.                expires Feb 2004                   [Page 12]

INTERNET DRAFT                  FAILOVER                     August 2003


                   (assigned tid = y, failover capable)
      validates   <--------------------------------------  send SCCRP
      SCCRP, etc.

      .... <after tunnel gets created, sessions are established> ....


      <This Node fails>
      After Failure:


      Failed Node                                          Peer

                 (old tid = x, old local tid = y)
      SCCRQ     ----------------------------------->  Detects recovery
                (assigned tid = z, failover capable)  for (remote tid = x)


                 (old tid = y, old local tid = x)
      SCCRP     <-----------------------------------  send SCCRP
                 (assigned tid = w, failover capable)


      .... <after new tunnel gets created, sessions from old tunnel are
            restored, both endpoints may send FSQs to clean up stale
            sessions> ....

                 (FSS AVP for sessions s1, s2, s3..)
      send FSQ  -------------------------------------> compute the state
                                                       of sessions in FSQ

                 (FSS AVP for sessions s1, s2, s3...)
      deletes  <-------------------------------------- send FSR
      stale sessions, if any


                 (FSS AVP for sessions s7, s8, s9...)
      compute  <-------------------------------------- send FSQ
      the sate of
      sessions in FSQ


                 (FSS AVP for sessions s7, s8, s9...)
      send FSR --------------------------------------> delete stale sessions,
                                                       if any


      .... <tunnel resumes normal operation after this> ....


Jain, et al.                expires Feb 2004                   [Page 13]

INTERNET DRAFT                  FAILOVER                     August 2003