Network Working Group S. Vinapamula Internet-Draft Juniper Networks Intended status: Standards Track S. Sivakumar Expires: April 17, 2014 Cisco Systems October 14, 2013 Flow high availability through PCP draft-vinapamula-flow-ha-00 Abstract This document describes a mechanism for a host signal Network Address Translator's (NAT) High Availability (HA) module to checkpoint interested connections. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 17, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Vinapamula & Sivakumar Expires April 17, 2014 [Page 1] Internet-Draft HA through PCP October 2013 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Issues with the existing implementations . . . . . . . . . . 2 4. Proposed Solution . . . . . . . . . . . . . . . . . . . . . . 3 5. Some Usage Examples . . . . . . . . . . . . . . . . . . . . . 4 6. Signaling HA for other services . . . . . . . . . . . . . . . 4 7. Security Considerations . . . . . . . . . . . . . . . . . . . 4 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 9. Normative references . . . . . . . . . . . . . . . . . . . . 5 1. Introduction Uninterrupted Internet service continuity is critical in service provider environment. To achieve this lots of service providers have active-backup systems. For some of the services, a state would be created for every connection for processing subsequent packets of that connection. For service continuity of those connections on backup when active fail, that corresponding state had to be checkpointed on the backup. NAT is one such service, where, when the mapping between a private IP address port and public IP address port is dynamically established and torn down on per connection basis. In such a case, connection's state has to be checkpointed to the backup, for uninterrupted service continuity even in case active fails. This document describes some of the existing issues in the way checkpointing is done today, and tries to address them. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Issues with the existing implementations In a high availability (HA) deployment, it is expensive in terms of memory and CPU and other resources to checkpoint all of the connections state periodically. Also checkpointing may not be required for all flows as all flows may not be critical. But, this leaves a challenge to identify what connections' to checkpoint. Vinapamula & Sivakumar Expires April 17, 2014 [Page 2] Internet-Draft HA through PCP October 2013 Typically, this is addressed by identifying long lived connections and checkpointing state of only those connections that lived long enough to the backup for service continuity. However, following are the issues with that approach: 1. A connection which could potentially be long-lived would face disruption in service on failure of active system, before it had not lived long enough for it to be checkpointed. 2. Also a connection may not be long lived but critical like shorter phone conversations. 3. Similarly not every long lived connection need to be critical, say a free service connections need not be checkpointed while a paid service connection is checkpointed of a hosted service. 4. Proposed Solution An application or user is the best judge to decide which connection is critical. An application or user MUST indicate that one or more of its connections is critical and disruption is not desired. This will trigger checkpointing of state to the backup. An application/user may indicate the desire for checkpoint through PCP client, and PCP client MUST mark bit zero of "Reserved" bits in the PCP request header as below. Here after in the document, this bit is referred as HA bit. All other bits in the "Reserved" field MUST be marked zero on transmissions and MUST be ignored on reception. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version = 2 |R| Opcode | Reserved 1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Requested Lifetime (32 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | PCP Client's IP Address (128 bits) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : : (optional) Opcode-specific information : Vinapamula & Sivakumar Expires April 17, 2014 [Page 3] Internet-Draft HA through PCP October 2013 : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : : (optional) PCP Options : : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ PCP request header with HA bit being the zero bit in the "Reserved" field. PCP server MAY honor this request depending on whether resources are available for checkpointing. If there are no resources available for checkpointing, but there are resources available to honor request say PCP MAP/PEER request, request is honored and there is no error returned. What information to checkpoint and how to checkpoint is out of scope of this document, and is left for implementation. However it is RECOMMENDED to checkpoint state on backup for honored requests before a response is sent to the PCP client. Communication between application/user and PCP client is implementation specific. 5. Some Usage Examples 1. Disruption in a phone connection is not desired. Application that is initiating a phone connection MUST mark connection HA bit in the header, while initiating a PCP request for checkpointing. 2. Similarly disruption in media streaming is not desired. A user hosting a media service, MUST mark HA bit in the header while initiating a mapping request, and MAY mark connection associated with that mapping, depending on whether the connection is from a paid subscriber or from a free subscriber. 6. Signaling HA for other services In conjunction with NAT, if there are any other services, that maintain state for any connection, they MAY register to PCP server, and MAY be triggered for checkpointing of that state. 7. Security Considerations Vinapamula & Sivakumar Expires April 17, 2014 [Page 4] Internet-Draft HA through PCP October 2013 A NAT device can always override the end user signalling if the administrator specified rules are not in policy with the end user signaling. There is a risk that every client may wish to checkpoint every connection, which can potentially load the system. Admin may restrict number of connections and the rate of checkpointing on per PCP client. 8. IANA Considerations This document does not require any action from IANA. 9. Normative references [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC6887] Wing, D., Cheshire, S., Boucadair, M., Penno, R., and P. Selkirk, "Port Control Protocol (PCP)", RFC 6887, April 2013. Authors' Addresses Suresh Vinapamula Juniper Networks 1194 North Mathilda Avenue Sunnyvale, CA 94089 USA Phone: +1 408 936 5441 EMail: sureshk@juniper.net Senthil Sivakumar Cisco Systems 7100-8 Kit Creek Road Research Triangle Park, NC 27760 USA Phone: +1 919 392 5158 EMail: ssenthil@cisco.com Vinapamula & Sivakumar Expires April 17, 2014 [Page 5]