Network Working Group                                              Y. Gu
Internet-Draft                                                    Huawei
Expires: January 4, 2013                                        M. Shore
                                                    No Mountain Software
                                                            S. Sivakumar
                                                           Cisco Systems
                                                            July 3, 2012


 A Framework and Problem Statement for Flow-associated Middlebox State
                               Migration
                  draft-gu-statemigration-framework-00

Abstract

   This document presents an initial framework and discussion of the
   problem of transferring middlebox (for example, firewall or NAT)
   flow-coupled state from one middlebox to another while the flow is
   still active.  This has most recently come up in the context of
   virtual machine (VM) migration between hypervisors, but it is a
   problem that has appeared in other situations, as well.  We present
   some of the parameters of the problem, define some language for
   discussing the problem, and begin to identify a path forward for
   addressing it.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 4, 2013.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal


Gu, et al.               Expires January 4, 2013                [Page 1]

Internet-Draft          Middlebox State Migration              July 2012


   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
   4.  Middlebox state  . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  What state is associated with a flow on a middlebox? . . .  6
     4.2.  State vs policy  . . . . . . . . . . . . . . . . . . . . .  7
     4.3.  Mechanisms for instantiating middlebox state . . . . . . .  7
   5.  "Moving" endpoints . . . . . . . . . . . . . . . . . . . . . .  8
     5.1.  A few words about addresses  . . . . . . . . . . . . . . .  8
     5.2.  Scenarios  . . . . . . . . . . . . . . . . . . . . . . . .  8
       5.2.1.  Virtual machine migration  . . . . . . . . . . . . . .  8
       5.2.2.  SCTP NAT . . . . . . . . . . . . . . . . . . . . . . .  9
       5.2.3.  High availability, and failover  . . . . . . . . . . .  9
   6.  "Directionality" . . . . . . . . . . . . . . . . . . . . . . . 10
   7.  Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     7.1.  Recognizing when an endpoint has moved . . . . . . . . . . 11
     7.2.  Topology discovery . . . . . . . . . . . . . . . . . . . . 11
     7.3.  Copying state from a middlebox . . . . . . . . . . . . . . 12
     7.4.  Installing state on the new middlebox  . . . . . . . . . . 13
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
   10. Informative References . . . . . . . . . . . . . . . . . . . . 16
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17


Gu, et al.               Expires January 4, 2013                [Page 2]

Internet-Draft          Middlebox State Migration              July 2012


1.  Introduction

   An end-to-end network flow typically traverses one or more
   "middlebox," which may retain state about the flow.  These include,
   for example, firewalls, NATs, traffic optimizers, and similar.  The
   flow-associated state is usually instantiated through a combination
   of traffic inspection and broad policies, but may also be created by
   the use of an explicit request or signaling mechanism.

   The problem of how to handle transfering flow-associated middlebox
   state when one flow endpoint moves is not a new one, but with some
   exceptions it remains largely unaddressed.  For example, situations
   in which one endpoint or another "move" (we define what it means to
   move an endpoint in more detail in Section 5) include mobile IP
   [RFC5944], failover in a high-availability deployment, and VM
   (virtual machine) migration.  Related problems include multihomed
   endpoints in SCTP and load balancing.

   In this document we establish terminology (Section 2), describe the
   problem, and lay out the components of the problem that would need to
   be addressed in a solution.


Gu, et al.               Expires January 4, 2013                [Page 3]

Internet-Draft          Middlebox State Migration              July 2012


2.  Terminology

   flow:  "Traffic flow" is defined in [RFC2722] as an artificial
      logical equivalent of a call or connection.  It is delimited by a
      start and a stop time.

   middlebox:  A middlebox was defined in [RFC3234] as "any intermediary
      device performing functions other than the normal, standard
      functions of an IP router on the datagram path between a source
      host and a destination host."  RFC 3234 provides an older but
      excellent and still-relevant taxonomy of middlebox types.

   move:  When we talk about an endpoint "moving" what we are describing
      is the endpoint changing its point of attachment to the network.
      For the purpose of this discussion we assume that it retains the
      same IP address after the move that it had before the move.

   policy:  See Section 4.2


Gu, et al.               Expires January 4, 2013                [Page 4]

Internet-Draft          Middlebox State Migration              July 2012


3.  Goals

   The problem we are interested in solving is the question of how to
   keep longer-lived network flows "alive" when an endpoint's point of
   attachment to a network changes.  The particular piece of this we
   intend to address is how to move the middlebox (in this case,
   firewall or NAT) state associated with a network flow to new
   middleboxes.


Gu, et al.               Expires January 4, 2013                [Page 5]

Internet-Draft          Middlebox State Migration              July 2012


4.  Middlebox state

4.1.  What state is associated with a flow on a middlebox?

   To date, we haven't been able to find a normative definition of the
   term 'state' in IETF documents.  More generally it tends to be
   considered to be a set of observable properties associated with an
   object.  This is (largely) distinct from automata theory, in which
   "state" refers to the condition of an object (or automaton).  The
   observable things which might be associated by a middlebox with a
   network flow are described below.

   Transport-layer middleboxes which keep flow-associated state through
   the duration of the flow typically keep, at a minimum, the standard
   IP 5-tuple:

             {s_addr, d_addr, s_port, d_port, protocol}

   where

   s_addr  is the source address

   d_addr  is the destination address

   s_port  is the source port

   d_port  is the destination port

   protocol  is the IP protocol (TCP, UDP, SCTP, RSVP, etc.)

   Other data elements often associated with a network flow include
   timers.

   Middlebox state is almost always associated with a specific interface
   (rather than the interface being an attribute of the flow).  Some
   "stateful inspection" firewalls may keep state from higher layers in
   the networking stack: everything from TCP sequence numbers to entire
   SIP dialogues.

   Note that the state associated with a flow may be left up when the
   flow is torn down in some implementations, such as those NATs that
   put the state on an activity-based timer as an efficiency mechanism,
   to avoid reinstantiating state should a new flow be created which
   shares the attributes of the flow which just ended.  This is often
   the case with HTTP, for example.


Gu, et al.               Expires January 4, 2013                [Page 6]

Internet-Draft          Middlebox State Migration              July 2012


4.2.  State vs policy

   We would like to draw a clear distinction between state and policy.
   'Policy' is a set of statements that define how traffic (in this
   case) is to be treated by the middlebox.  In some sense policy is a
   description of what state should be applied to a network flow; that
   is to say, state includes the instantiation of policy.  When a flow
   first arrives at a middlebox, it consults its policy to determine
   what state (if any) is to be created and then associated with that
   flow

   As a general rule of thumb, policy is provisioned while state
   represents run-time responses to environmental conditions (in this
   case, network flows).  Because policy is provisioned and because we
   assume that the middleboxes between which state would be migrated are
   under the administrative control of the same organization, we will
   make another assumption that there is consistent policy configured
   across middleboxes.  We are aware that this is not always a correct
   assumption.

4.3.  Mechanisms for instantiating middlebox state

   State is created on middleboxes using a small number of mechanisms,
   sometimes in combination.

   The most common means by which middlebox state is created is that the
   middlebox examines traffic and compares it against its own policies,
   which have typically been configured or provisioned by a systems or
   network administrator but in very simple cases can come
   preprovisioned, for example on commodity consumer equipment.  It then
   creates middlebox state, in the form of a firewall pinhole, a NAT
   table mapping, QoS table entry, etc.

   Another means is through explicit request.  An endpoint or its proxy
   sends a request for resources (again, firewall pinhole, NAT table
   mapping, and so on) to the middlebox using some sort of "signaling"
   protocol to request the resource.  The middlebox compares the request
   to its policy and grants or denies the request based on that policy.
   Examples of explicit request include RSVP [RFC2205], midcom
   [RFC3303], TURN [RFC5766], and the work being done by the IETF
   pcp [1] working group.

   It is worth mention that there are mechanisms that are essentially
   hybrids of the previous two approaches, using expected effects of
   sending traffic across a middlebox to trigger hoped-for state
   instantiation.  STUN [RFC5389] is probably the best-known example of
   this.


Gu, et al.               Expires January 4, 2013                [Page 7]

Internet-Draft          Middlebox State Migration              July 2012


5.  "Moving" endpoints

   Moving an endpoint, in the context of this internet draft, refers to
   changing its point of attachment to a network.  Doing so may cause
   traffic to cross different middleboxes from the ones the traffic
   traversed when the middlebox state was created.

5.1.  A few words about addresses

   One question that comes up from time to time in discussions of VM
   migration is whether or not the IP address will change as a result of
   the migration.  We believe that this is out of scope for the time
   being, not the least because host operating system support is
   potentially difficult.  If our goal is to keep a given network flow
   up and alive during a migration, not only would the endpoint
   operating system need to be aware that its address has changed, it
   would also need to to be able to signal the other end of the flow,
   which would have to respond by modifying open sockets' sockaddrs,
   etc.  There are also some obvious security problems that would need
   to be addressed.

5.2.  Scenarios

   In this section we introduce a few scenarios.  We believe the
   architecture is fundamentally the same in these scenarios and that
   what we're describing is a general problem.

5.2.1.  Virtual machine migration

   The live migration (i.e. the VM appears to remain "up" and available
   during the migration) of virtual machines between hypervisors in the
   same data center has been established practice for several years now,
   but there's been a move towards live migration of VMs between
   geographically disparate data centers (see, for example this
   collaboration [2] between Cisco and VMWare).  This provides high
   availability, the ability to perform data center maintenance without
   downtime, data center migration or consolidation, data center
   expansion, and workload balancing.  There is a compelling use case
   for VM migration.

   However, reliance on proprietary tunneling and signaling protocols
   leads to vendor lock-in, lack of interoperability between products
   from different vendors, and, of course, lack of openness that can
   mask architectural and security flaws.


Gu, et al.               Expires January 4, 2013                [Page 8]

Internet-Draft          Middlebox State Migration              July 2012


5.2.2.  SCTP NAT

   The SCTP [RFC4960] protocol supports multihomed endpoints.  Any NAT
   that is port-aware (and these days it is nearly all of them) will
   need to have SCTP support in order to be able to handle extracting
   the port numbers even for flows that are single-homed on each end.
   This provides a mechanism for transparent failover when one path
   taken by the network flow fails (see section 6.4 in [RFC4960]

   The upshot of this is that if a NAT is maintaining state related to a
   flow on the primary path and the primary path fails, that state may
   need to be transferred to the NAT being traversed by the secondary
   path.

   This problem is being addressed in the IETF behave [3] working group.

5.2.3.  High availability, and failover

   "High-availability" commonly suggests failover as a mechanism to
   guarantee uninterrupted (or minimally interrupted) services.  When a
   failure is detected services are shifted onto a secondary server.
   Note that this shift can be implemented through VM migration, as well
   as having the services brought up on a new system image.

   Because outages are sometimes caused by site failures, failover can
   take place across geographically disparate sites.  This introduces
   the likelihood of the flow now traversing a very different network
   path and a new set of middleboxes.


Gu, et al.               Expires January 4, 2013                [Page 9]

Internet-Draft          Middlebox State Migration              July 2012


6.  "Directionality"

   One of the questions that comes up when considering an overall
   architecture to solve this set of problems is who initiates the state
   migration and how the data "flow" from place to place.

   One approach is to have the middleboxes communicate directly with
   each other.  In this case having all middleboxes poll all other
   middleboxes for copies of their state seems wasteful and inefficient,
   suggesting that communication between middleboxes would need a
   specific trigger.  The "old" middlebox could send its state to the
   "new" middlebox or the new middlebox could send a request to the old
   middlebox for a copy of its state.  In either case one middlebox
   would need to know the location of the other and be able to
   communicate with it (both parties would need to authenticate to each
   other).  Note that if a catastrophic network event caused the old
   middlebox to become unreachable, it would be impossible to
   successfully query it for its state.  [Note that this approach was
   considered for SCTP NAT traversal and discarded as impossible, since
   there was no way for one NAT to know about other NATs.]

   Another approach is to have some controlling entity involved, either
   mediating communication between middleboxes or directing
   communication between middleboxes.  In a VM migration scenario, a VM
   manager, or a network manager communicating with a VM manager, is an
   obvious candidate.

   The orthogonal question to whether or not there's a mediating entity
   is who initiates the communication - does the old middlebox respond
   to a catastrophic event by dumping state before shutting down (not
   always possible, obviously) or is it polled by a mediating device or
   a new middlebox?  Another possibility is to periodically transfer
   incremental information so that a non-recoverable error can save most
   of the flows, if not all.


Gu, et al.               Expires January 4, 2013               [Page 10]

Internet-Draft          Middlebox State Migration              July 2012


7.  Problems

   The problems that must be solved in order to move middlebox state
   along with a moving endpoint include:

   o  Recognizing when an endpoint has moved

   o  Locating middleboxes along the original path

   o  Locating middleboxes along the new path

   o  Getting a copy of state from middleboxes along the old path

   o  Installing that state in middleboxes along the new path

7.1.  Recognizing when an endpoint has moved

   As touched upon in Section 5.2, there are various circumstances that
   could cause an endpoint to change its point of attachment to a
   network.  They fall into two broad categories: planned and unplanned.

   In the planned case, some entity knows that an endpoint is about to
   move and the move can happen in a controlled fashion.  There may be
   time to send network queries, learn topology, and gather state.

   The unplanned case is typically a response to the failure of some
   element in the network.  A monitoring heartbeat is missed, a
   connection times out, or some other indication of catastraphic
   failure is received by an endpoint or by a monitoring service.  Not
   only does this interfere with the notion of an organized transfer
   from one path to the new one, it also means that there may be cases
   where the old middlebox is not reachable and it's not possible to
   query its state.

7.2.  Topology discovery

   Somehow or other the state migration mechanism needs to be able to
   locate and communicate with both the middleboxes on the old path and
   the middleboxes on the new path.  This is not a trivial problem; IP
   was designed to have the network itself be largely opaque to
   endpoints, and very often systems and network administrators prefer
   not to expose network topology, feeling that it would introduce
   security threats.

   There are several options, including configuration, discovery, and
   notification.  In configuration, someone with knowledge of the
   network topology would be able to construct a table describing
   middleboxes associated with certain routes.  In discovery, a network


Gu, et al.               Expires January 4, 2013               [Page 11]

Internet-Draft          Middlebox State Migration              July 2012


   mechanism would be used to query for the middleboxes along a path,
   similar to traceroute or to a PATH message in RSVP [RFC2205].

   A configuration mechanism would have the disadvantage of being not
   particularly responsive to changes in the network, as well as being
   somewhat error-prone.  However, it would not involve inventing a new
   network mechanism or requiring changes on every participating
   middlebox (although the state migration mechanism itself would nearly
   certainly require changes).

   [Note that an architecture that had the middlebox copying its own
   state out to some third party would almost certainly have to be
   configuration-base.]

   A discovery-based approach would require putting new software on
   every middlebox, an approach that is intuitively unappealing and that
   has been repeatedly shown to inhibit adoption of newer technologies.
   There is no such thing as incremental deployment using this approach.
   It also introduces security problems, since without the appropriate
   protections it would allow attackers to probe and discover not just
   network topology but specifically the location of security devices/
   middleboxes in a given network.  On the other hand it's robust
   against configuration errors and highly responsive to changes in the
   underlying network.

   A third option, notification, relies on a middlebox announcing its
   presence to the network, typically using anycast or broadcast.  This
   also requires changes to both the middlebox and a controlling entity,
   and a an announcement/notification protocol.  It has the advantage of
   being responsive to new middleboxes coming up in the network,
   although a mechanism (such as a heartbeat) would be needed to detect
   outages and drops.

   The primary security consideration in a notification scenario is that
   the network must be tightly controlled to prevent announcements from
   being eavesdropped upon by adversaries.

7.3.  Copying state from a middlebox

   Another problem to be solved is the one of copying state from a
   middlebox, encoding it, and transferring it over the network.

   It may be the case that the middleboxes are from different
   manufacturers/vendors, and so the problem of representing the state
   we wish to transfer includes the question of presenting it in a
   vendor-neutral format, including both state semantics and state
   syntax.


Gu, et al.               Expires January 4, 2013               [Page 12]

Internet-Draft          Middlebox State Migration              July 2012


   A somewhat more challenging aspect of this problem is how to
   transport the encoded state.  For one thing, it may be that the event
   that triggered the endpoint migration has also rendered the middlebox
   in question unreachable.  For another, what sort of load this imposes
   on the middlebox depends, among other things, on the "directionality"
   of the state migration.  It may be that an external device, such as a
   session controller, a hypervisor, or another middlebox queries the
   old middlebox for a copy of its state.  In high-availability
   scenarios the middlebox may end up "pushing" copies of its state out
   to some controlling or intermediate entity, such as a hypervisor.

7.4.  Installing state on the new middlebox

   The problem of installing state on the new middlebox is closely
   related to the one of copying state from the old middlebox.  In both
   cases we're facing the problems of representation and encoding, a
   transport protocol to/from the middlebox, and questions about
   reachability.


Gu, et al.               Expires January 4, 2013               [Page 13]

Internet-Draft          Middlebox State Migration              July 2012


8.  Security Considerations

   Any time we introduce new mechanisms to query and manipulate
   middleboxes, we also introduce potentially very serious security
   exposures.

   In this case, because we're planning on discovering the location of
   middleboxes, querying the middleboxes for their state, and installing
   state on middleboxes, we face a very broad range indeed of potential
   threats.

   Network and systems administrators typically want to conceal network
   topology from outsiders, and it may be necessary to use authenticated
   discovery (packet filtering may be adequate for some deployments but
   not all).  This introduces problems around credentials management and
   keying for participants, and may suggest that we would want to
   minimize the number of network elements talking with one another.

   Cleary the ability to copy data from a middlebox introduces the
   ability to discovery yet more network topology, and in particular to
   identify specific firewall pinholes and NAT table mappings, and their
   associated state.

   Similarly, the ability to install state on a middlebox can introduce
   both Denial of Service (DoS) vulnerabilities but also the ability of
   an attacker to penetrate a middlebox, or to disable it completely.

   In all cases, protections must be designed with sensitivity to
   performance, since middleboxes often are processing very heavy
   traffic loads.  This means keeping an eye on cryptographic processing
   demands, key and other credentials management, etc.


Gu, et al.               Expires January 4, 2013               [Page 14]

Internet-Draft          Middlebox State Migration              July 2012


9.  IANA Considerations

   This document has no actions for IANA.


Gu, et al.               Expires January 4, 2013               [Page 15]

Internet-Draft          Middlebox State Migration              July 2012


10.  Informative References

   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
              Functional Specification", RFC 2205, September 1997.

   [RFC2722]  Brownlee, N., Mills, C., and G. Ruth, "Traffic Flow
              Measurement: Architecture", RFC 2722, October 1999.

   [RFC3234]  Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and
              Issues", RFC 3234, February 2002.

   [RFC3303]  Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A., and
              A. Rayhan, "Middlebox communication architecture and
              framework", RFC 3303, August 2002.

   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
              RFC 4960, September 2007.

   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
              October 2008.

   [RFC5766]  Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using
              Relays around NAT (TURN): Relay Extensions to Session
              Traversal Utilities for NAT (STUN)", RFC 5766, April 2010.

   [RFC5944]  Perkins, C., "IP Mobility Support for IPv4, Revised",
              RFC 5944, November 2010.

   [1]  <http://datatracker.ietf.org/wg/pcp/charter/>

   [2]  <http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/
        ns224/ns836/white_paper_c11-557822.pdf>

   [3]  <http://datatracker.ietf.org/wg/behave/>


Gu, et al.               Expires January 4, 2013               [Page 16]

Internet-Draft          Middlebox State Migration              July 2012


Authors' Addresses

   Yingjie Gu
   Huawei

   Phone: +86-25-56624760
   Fax:   +86-25-56624702
   Email: guyingjie@huawei.com


   Melinda Shore
   No Mountain Software
   PO Box 16271
   Two Rivers, AK  99716
   US

   Phone: +1 907 322 9522
   Email: melinda.shore@nomountain.net


   Senthil Sivakumar
   Cisco Systems
   7100-8 Kit Creek Road
   Research Triangle Park, NC
   US

   Email: ssenthil@cisco.com


Gu, et al.               Expires January 4, 2013               [Page 17]