Network Working Group Reinaldo Penno Internet-Draft Nortel Networks Expires Feb 2004 Keyur Parikh Megisto Systems Ly Loi Tahoe Networks Leo Huber Extreme Networks Vipin Jain Riverstone Networks Mark Townsley Cisco Systems August 2003 Fail Over extensions for L2TP 'failover' draft-ietf-l2tpext-failover-02.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. This document is an Internet-Draft and is NOT offered in accordance with Section 10 of RFC2026, and the author does not provide the IETF with any rights other than to publish as an Internet-Draft The distribution of this memo is unlimited. Please send comments to the L2TP mailing list (l2tpext@ietf.org). Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract L2TP is a connection-oriented protocol that has shared state between active endpoints. Some of this shared state is vital for operation but may be rather volatile in nature, such as packet sequence numbers used on the L2TP Control Connection. When failure of one side of a control connection occurs, a new control connection is created and associated with the old connection by exchanging information about the old connection. Such a mechanism is not intended as a replacement Jain, et al. expires Feb 2004 [Page 1] INTERNET DRAFT FAILOVER August 2003 for an active fail over with some mirrored connection states, but as an aid just for those parameters that are particularly difficult to have immediately available. Protocol extensions to L2TP defined in this document are intended to facilitate state recovery, providing additional resiliency in an L2TP network and improving a remote system's layer 2 connectivity. Terminology Endpoint: An L2TP control connection endpoint, either LAC or LNS. Active Endpoint: An endpoint that is currently providing service. Backup Endpoint: A redundant endpoint standing by for the active endpoint. Failover: The action of a Backup Endpoint taking over the service of an Active Endpoint. This could be due to administrative action or failure of the Active Endpoint. 1. Introduction The goal of this draft is to aid the overall resiliency of an L2TP endpoint by introducing extensions to RFC 2661 [L2TP] that will minimize the recovery time of the L2TP layer after a failover, while minimizing the impact on its performance. Therefore it is assumed that the endpoint's overall architecture is also supportive in the resiliency effort. To ensure proper operation of a L2TP endpoint after a failover, the associated information of the tunnels and sessions between them must be correct and consistent. This includes both the configured and dynamic information. The configured information is assumed to be correct and consistent after a failover, otherwise the tunnels and sessions would not have been setup in the first place. The dynamic information, which is also referred to as stateful information, changes with the processing of tunnel's control and data packets. Currently, the only such information that is essential to the tunnel's operation is its sequence numbers. For the tunnel control channel, the inconsistencies in its sequence numbers can result in the termination of the entire tunnel. For tunnel sessions, the inconsistency in its sequence, when used, can cause significant data loss thus giving perception of "service loss" to the end user. Thus, an optimal resilient architecture that aims to minimize "service loss" after a failover must make provision for the tunnel's essential stateful information - i.e. its sequence numbers. Jain, et al. expires Feb 2004 [Page 2] INTERNET DRAFT FAILOVER August 2003 Currently, there are two options available: the first option is to ensure that the backup endpoint is in complete sync with the active with respect to the control and data sessions sequence numbers. The other option is to simply re-establish all the tunnels and its sessions after a failover. The drawback of the first option is that it adds significant performance and complexity impact to the endpoint's architecture, especially as tunnel and session aggregation increases. The drawback of the second option is that it increases the "service loss" time, especially as the architecture scales. To alleviate the above-mentioned drawbacks of the current options, this draft introduces a mechanism to bring the dynamic stateful information of a tunnel to correct and consistent state after a failure. Proposed mechanism, currently, defines the recovery of tunnels and sessions that were in established state prior to the failure. 2.0 Protocol Operation The failover protocol allows an endpoint to specify its failover capabilities during tunnel establishment. Based on failover capabilities, two endpoints learn if a tunnel and its sessions support recovery. Upon failure, a new tunnel is initiated for every old tunnel that needs recovery. The new tunnel includes the Old Tunnel ID AVP, a new AVP defined in Section 4.3, which identifies the old tunnel. Upon getting this AVP, an endpoint learns that its peer has failed over and would like to recover the identified tunnel. After the new tunnel is established, all active sessions and tunnel characteristics of the previous tunnel are moved to the new tunnel. Normal tunnel activity is resumed then. 2.1 Tunnel Establishment Regular Tunnel establishment procedures are same as defined by [L2TP], except when establishing a tunnel, endpoints exchange their failover capabilities using Failover Capability Initiate AVP and Failover Capability Response AVP in SCCRQ and SCCRP control messages. 2.2 Session Establishment The session establishment and termination procedures described in [L2TP] remains same. 2.3 Post Failure Operation This section describes the behavior of the failed endpoint and its Jain, et al. expires Feb 2004 [Page 3] INTERNET DRAFT FAILOVER August 2003 peer during recovery. Endpoints MUST avoid sending any control packets over the old tunnel that is being recovered. Tunnels that successfully negotiated failover capabilities may use the failover protocol. SCCRQ, SCCRP and SCCCN messages SHOULD use the same set and value of AVPs, except Assigned Tunnel ID AVP, that were used in the old tunnel in order to keep tunnel characteristics same. 2.3.1 Failed Endpoint's Behavior It establishes a new tunnel as specified in [L2TP] with following considerations: - It MUST include the Old Tunnel Id AVP and Old Local Tunnel Id AVP, defined in section 4.3 and 4.4 respectively, in the new SCCRQ. - For any reason, if the new tunnel could not be established, the endpoint MUST assume that recovery on that tunnel has failed and it SHOULD clear the tunnel and sessions within that tunnel on its end. 2.3.2 Failed Endpoints' Peer's behavior It accepts tunnel requests from the peer as specified in [L2TP] with following considerations: - It MUST use the Old Tunnel Id AVP, defined in section 4.3, to determine the tunnel peer is trying to recover. If this AVP is not present then the endpoint can assume it to be a new tunnel. - It MUST validate Old Tunnel Id and Old Local Tunnel Id in an incoming SCCRQ. If it does not find a match it MUST reject the SCCRQ. - It may reject the new tunnel request if it did not advertise failover capabilities on the corresponding old tunnel. 2.3.3 Preserving Sessions Upon establishment of the new tunnel, both endpoints - MUST consider the sessions that were in established state in the old tunnel to be now belonging to the new tunnel. - MUST entertain various control messages, for example new session establishment request via ICRQ, as defined in [L2TP]. Jain, et al. expires Feb 2004 [Page 4] INTERNET DRAFT FAILOVER August 2003 2.3.4 Session State Inconsistency Between Peers The failover mechanism allows the two ends of a tunnel to preserve the sessions that were in the established state. However, it is very important for two endpoints to agree upon the sessions that they are going to preserve in a tunnel. If failover happens while a session is being established or being torn down, it is possible that one of the endpoints consider the session in the established state while it's peer consider the same session to be down or non existent. For example, when an endpoint fails after sending a CDN message that never made to the peer. Or when an endpoint fails after sending ICCN message that never made to to the peer. To facilitate synchronization of the sessions under such circumstances, following mechanism is proposed: - After the new tunnel is established, the sessions that were not in established state are brought down locally without sending a CDN message to the peer. - A peer could explicitly ask for the session(s) that it consider (perhaps based on inactivity, etc.) might not exist on the peer. It does so by sending a Failover Session Query (FSQ) Message, including one Failed Session State (FSS) AVP for each session that might have ceased to exist on the peer. - Upon receipt of an FSQ message, peer responds with Failover Session Response (FSR) Message that contains one FSS AVP for each FSS AVP received in FSQ message, stating whether the endpoint considers it in the established state. - Upon getting FSR Message for from peer an endpoint brings down the session that peer considers not in the established state without sending a CDN. - For security purposes, FSQ Message SHOULD be entertained only within a configured period after the failover. FSQ and FSR Messages could also include Challenge and Challenge Response AVPs as defined by [L2TP] to validate the peer's identity. The Session State Inconsistency mechanism SHOULD be carried out only for sessions that are not getting any traffic, and therefore are considered to have possible incosistent state, upon failover. 2.3.4. Data Plane Behavior If sequencing was used on data sessions, then, upon detecting Jain, et al. expires Feb 2004 [Page 5] INTERNET DRAFT FAILOVER August 2003 peer's failure, the non-failed endpoint MUST set the next expected Ns based on the incoming Ns value. It must also flush re-ordering buffers if applicable. 3.0. Failover Messages Failover draft defines two new messages to help bring the state of the sessions to an agreeable state. These message could be sent only on the new tunnel that was established for recovery. 3.1. Failover Session Query Message (FSQ) Failover Session Query Message (FSQ), Message Type TBD, is sent by an endpoint after failover to learn the state of a session on the peer. There are one ore more Failover Session State (FSS) AVP(s), defined in section 4.5, present in this message. This message MAY include the Challenge AVP to validate the identity of the originator, only if the tunnel authentication was done when the old tunnel was established. 3.2 Failover Session Response Message (FSR) Failover Session Response Message (FSR), Message Type TBD, is sent by a node in response to an FSQ Message to let peer learn about the state of a session on the endpoint. For every FSS AVP in FSQ message there is one FSS AVP present in FSR message. FSR message MUST include a Challenge Response AVP if FSQ was received with a Challenge AVP. 4.0. Failover AVPs The new AVPs that should be included in SCCRQ, SCCRP messages are as follows: 4.1. Failover Initiate Capability AVP [SCCRQ, SCCRP] Failover Capability Initiate AVP, Attribute Type [TBD], describes if an endpoint could initiate recovery on a given tunnel after failure. Jain, et al. expires Feb 2004 [Page 6] INTERNET DRAFT FAILOVER August 2003 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |M|H| rsvd | Length | Vendor Id [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The AVP is not mandatory (the M-bit MUST be set to 0) The AVP MAY be hidden (the H-bit set to 0 or 1). 4.2. Failover Response Capability AVP [SCCRQ, SCCRP] Failover Capability Response AVP, Attribute Type [TBD], describes if an endpoint is capable of responding to failure on a given tunnel. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |M|H| rsvd | Length | Vendor Id [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The AVP is not mandatory (the M-bit MUST be set to 0) The AVP MAY be hidden (the H-bit set to 0 or 1). 4.3. Old Tunnel ID AVP [SCCRQ, SCCRP] The Old Tunnel ID AVP, Attribute Type [TBD], indicates the Tunnel ID in SCCRQ and SCCRP messages that was assigned by the receiver before failure. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |M|H| rsvd | Length | Vendor Id [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | Old Tunnel Id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This AVP is mandatory (the M-bit MUST be set to 1). The AVP may be hidden (the H-bit set to 0 or 1). Jain, et al. expires Feb 2004 [Page 7] INTERNET DRAFT FAILOVER August 2003 4.4. Old Local Tunnel ID AVP [SCCRQ, SCCRP] The Old Local Tunnel ID AVP, Attribute Type [TBD], indicates the Tunnel Id that was assigned by the sender in SCCRQ or SCCRP messages before failure. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |M|H| rsvd | Length | Vendor Id [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | Old Local Tunnel Id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The AVP is mandatory (the M-bit MUST be set to 1) The AVP may be hidden (the H-bit set to 0 or 1). 4.5. Failover Session State AVP [FSQ, FSR] The Failover Session State (FSS) AVP, Attribute Type [TBD], serves different purposes in FSQ and FSR messages. In FSQ Message it indicates the Assigned Session Id it received from the peer when the session was originally established, i.e. the old remote session id. It SHOULD be used only for the sessions that an endpoint considers might not exist on the peer but would like to query the peer regarding the same. In FSR Message it indicates the state of the session that were queried in the FSQ message. An endpoint SHOULD include one FSS AVP in FSR message for every FSS AVP in FSQ message. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |M|H| rsvd | Length | Vendor Id [IETF] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attribute Type [TBD] | Assigned Session Id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Assigned Session State | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Assigned Session State in this AVP the value of 1 if session is considered to be in the established state by an endpoint. A value of 0 is used otherwise. Other values are reserved that should not be set while sending and be ignored upon receipt. Jain, et al. expires Feb 2004 [Page 8] INTERNET DRAFT FAILOVER August 2003 The AVP is mandatory (the M-bit MUST be set to 1) The AVP may be hidden (the H-bit set to 0 or 1). 5.0 L2TP Over IPsec Considerations during failover During failover, The tunnel initiator and responder of the new tunnel MUST use the same source IP address and port as the one used for the failed tunnel. This will prevent any mismatches in the existing IPsec filter/policy database at both ends. If this cannot be preserved, then the procedure laid out in [L2TP IPsec] MUST be utilized to cover all cases of dynamic assignments of IP address and ports. 6. IANA Considerations This document requires three new "AVP Attributes" to be assigned through IETF Consensus [RFC2434] as indicated in Section 10.1 of [RFC2661]. These are: Failover Initiate Capability AVP (section 4.1) Failover Response Capability AVP (section 4.2) Old Tunnel ID AVP (section 4.3) Old Local Tunnel ID AVP (section 4.4) Failover Session State AVP (section 4.5) Failover Session Query Message (FSQ) (section 3.1) Failover Session Response Message (FSR) (section 3.2) 7. Security Considerations The failover mechanism described here leaves a small (1 in 2^32) room for an intruder to discover the old tunnel id of an existing tunnel by trying out various possibilities in Old Tunnel Id and Old Local Tunnel Id AVP. It also introduces an opportunity for an intruder to spoof the FSQ message to know the active sessions on a node. This could be avoided by using Challenge Request AVP and Challenge Response AVP in FSQ and FRQ messages. The time window during which FSQ messages are accepted (after failover) could be configured to a smaller value to reduce the vulnerability. Jain, et al. expires Feb 2004 [Page 9] INTERNET DRAFT FAILOVER August 2003 8. Author's Addresses Reinaldo Penno Nortel Networks 2305 Mission College Blvd Santa Clara, CA 95054 Phone: +1 408.565.3023 Email: rpenno@nortelnetworks.com Keyur Parikh Megisto Systems 20251 Century Boulevard, Suite 120 Germantown, MD 20876 Phone: +1 301.444.1723 Email: kparikh@megisto.com Leo Huber Extreme Networks 3585 Monroe St. Santa Clara CA 95051 Phone: +1 408.597.3037 Email: lhuber@extremenetworks.com Ly Loi Tahoe Networks 3052 Orchard Drive San Jose, CA 95134 Phone: +1 408.944.8630 Email: lll@tahoenetworks.com Vipin Jain Riverstone Networks 5200 Great America Parkway Santa Clara, CA 95054 Phone: +1 408.878.0464 Email: vipinietf@yahoo.com W. Mark Townsley Cisco Systems 7025 Kit Creek Road PO Box 14987 Research Triangle Park, NC 27709 EMail: townsley@cisco.com Jain, et al. expires Feb 2004 [Page 10] INTERNET DRAFT FAILOVER August 2003 8. References [L2TP] Townsley, et. al., "Layer Two Tunneling Protocol L2TP", RFC2661 [L2TP IPsec] Patel, et. al., "Securing L2TP using IPsec", RFC3139 Appendix A This section describes some design considerations that came up during discussions when developing the proposal: A.1 Backward compatibility and extensibility - The mechanism should be backwards compatible; i.e. it should not redefine existing behavior of [L2TP] compliant systems. - The protocol should allow a peer to detect failover capabilities in advance, for it to fall back to other failover mechanisms should peer does not support proposed failover protocol. - The protocol should allow future extensions to fail-over mechanism at ease. A.2 Less failover recovery time The mechanism should have least possible time to recover from failover (target of 3-5 seconds for 30k tunnels). Specifically it should take following into consideration: - Faster recovery: by utilizing less number of messages exchanged to recover from failover - CPU intensiveness: less cpu intensive a proposal is, better are the changes of faster recovery - Parallel establishment of various tunnels: by keeping different tunnel reestablishments independent of one another. A.3 Less Payload data loss The mechanism should have least possible impact on data flows for sessions with sequencing enabled. Jain, et al. expires Feb 2004 [Page 11] INTERNET DRAFT FAILOVER August 2003 A.4 Minimum interference with pre-failure control traffic The mechanism should define a way of clearly distinguishing the messages that were sent before failover from that which are sent after. Specifically, it should define a mechanism that avoid confusion between sequence numbers that were used before and after if the same Tunnel Id is used. A.5 Simplicity Simpler the protocol is, better are the changes of being adopted by everybody. Following would help achieve this: - Use of existing AVPs, messages and packet formats. - Avoid introducing special considerations and mechanisms a new implementation would have to deal with. - Simpler post fail-over synchronization mechanism. A.6 Security The mechanism should provide a mechanism to authenticate peers when resynchronization is happening after a failover. A.7 Scalability It is very important for a proposed protocol to work well for a scalable deployment. This includes dealing with all design considerations discussed above for scalable deployments, having thousands of tunnels or sessions or mix of the two. A target of 30,000 tunnels carrying 150,000 to 200,000 sessions from 300 peers was considered during the design. Appendix B Figure below outlines the the failover protocol operation for an example tunnel. The failover protocol does not preclude an endpoint from recovering multiple tunnels in parallel. It also does not preclude an endpoint from sending multiple FSQs to recover quickly. Before Failure: Endpoint Peer (assigned tid = x, failover capable) SCCRQ --------------------------------------> validates SCCRQ Jain, et al. expires Feb 2004 [Page 12] INTERNET DRAFT FAILOVER August 2003 (assigned tid = y, failover capable) validates <-------------------------------------- send SCCRP SCCRP, etc. .... .... After Failure: Failed Node Peer (old tid = x, old local tid = y) SCCRQ -----------------------------------> Detects recovery (assigned tid = z, failover capable) for (remote tid = x) (old tid = y, old local tid = x) SCCRP <----------------------------------- send SCCRP (assigned tid = w, failover capable) .... .... (FSS AVP for sessions s1, s2, s3..) send FSQ -------------------------------------> compute the state of sessions in FSQ (FSS AVP for sessions s1, s2, s3...) deletes <-------------------------------------- send FSR stale sessions, if any (FSS AVP for sessions s7, s8, s9...) compute <-------------------------------------- send FSQ the sate of sessions in FSQ (FSS AVP for sessions s7, s8, s9...) send FSR --------------------------------------> delete stale sessions, if any .... .... Jain, et al. expires Feb 2004 [Page 13] INTERNET DRAFT FAILOVER August 2003