Network Working Group Y. Gu
Internet-Draft Huawei
Expires: January 4, 2013 M. Shore
No Mountain Software
S. Sivakumar
Cisco Systems
July 3, 2012
A Framework and Problem Statement for Flow-associated Middlebox State
Migration
draft-gu-statemigration-framework-00
Abstract
This document presents an initial framework and discussion of the
problem of transferring middlebox (for example, firewall or NAT)
flow-coupled state from one middlebox to another while the flow is
still active. This has most recently come up in the context of
virtual machine (VM) migration between hypervisors, but it is a
problem that has appeared in other situations, as well. We present
some of the parameters of the problem, define some language for
discussing the problem, and begin to identify a path forward for
addressing it.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 4, 2013.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Gu, et al. Expires January 4, 2013 [Page 1]
Internet-Draft Middlebox State Migration July 2012
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. Middlebox state . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. What state is associated with a flow on a middlebox? . . . 6
4.2. State vs policy . . . . . . . . . . . . . . . . . . . . . 7
4.3. Mechanisms for instantiating middlebox state . . . . . . . 7
5. "Moving" endpoints . . . . . . . . . . . . . . . . . . . . . . 8
5.1. A few words about addresses . . . . . . . . . . . . . . . 8
5.2. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2.1. Virtual machine migration . . . . . . . . . . . . . . 8
5.2.2. SCTP NAT . . . . . . . . . . . . . . . . . . . . . . . 9
5.2.3. High availability, and failover . . . . . . . . . . . 9
6. "Directionality" . . . . . . . . . . . . . . . . . . . . . . . 10
7. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.1. Recognizing when an endpoint has moved . . . . . . . . . . 11
7.2. Topology discovery . . . . . . . . . . . . . . . . . . . . 11
7.3. Copying state from a middlebox . . . . . . . . . . . . . . 12
7.4. Installing state on the new middlebox . . . . . . . . . . 13
8. Security Considerations . . . . . . . . . . . . . . . . . . . 14
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
10. Informative References . . . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17
Gu, et al. Expires January 4, 2013 [Page 2]
Internet-Draft Middlebox State Migration July 2012
1. Introduction
An end-to-end network flow typically traverses one or more
"middlebox," which may retain state about the flow. These include,
for example, firewalls, NATs, traffic optimizers, and similar. The
flow-associated state is usually instantiated through a combination
of traffic inspection and broad policies, but may also be created by
the use of an explicit request or signaling mechanism.
The problem of how to handle transfering flow-associated middlebox
state when one flow endpoint moves is not a new one, but with some
exceptions it remains largely unaddressed. For example, situations
in which one endpoint or another "move" (we define what it means to
move an endpoint in more detail in Section 5) include mobile IP
[RFC5944], failover in a high-availability deployment, and VM
(virtual machine) migration. Related problems include multihomed
endpoints in SCTP and load balancing.
In this document we establish terminology (Section 2), describe the
problem, and lay out the components of the problem that would need to
be addressed in a solution.
Gu, et al. Expires January 4, 2013 [Page 3]
Internet-Draft Middlebox State Migration July 2012
2. Terminology
flow: "Traffic flow" is defined in [RFC2722] as an artificial
logical equivalent of a call or connection. It is delimited by a
start and a stop time.
middlebox: A middlebox was defined in [RFC3234] as "any intermediary
device performing functions other than the normal, standard
functions of an IP router on the datagram path between a source
host and a destination host." RFC 3234 provides an older but
excellent and still-relevant taxonomy of middlebox types.
move: When we talk about an endpoint "moving" what we are describing
is the endpoint changing its point of attachment to the network.
For the purpose of this discussion we assume that it retains the
same IP address after the move that it had before the move.
policy: See Section 4.2
Gu, et al. Expires January 4, 2013 [Page 4]
Internet-Draft Middlebox State Migration July 2012
3. Goals
The problem we are interested in solving is the question of how to
keep longer-lived network flows "alive" when an endpoint's point of
attachment to a network changes. The particular piece of this we
intend to address is how to move the middlebox (in this case,
firewall or NAT) state associated with a network flow to new
middleboxes.
Gu, et al. Expires January 4, 2013 [Page 5]
Internet-Draft Middlebox State Migration July 2012
4. Middlebox state
4.1. What state is associated with a flow on a middlebox?
To date, we haven't been able to find a normative definition of the
term 'state' in IETF documents. More generally it tends to be
considered to be a set of observable properties associated with an
object. This is (largely) distinct from automata theory, in which
"state" refers to the condition of an object (or automaton). The
observable things which might be associated by a middlebox with a
network flow are described below.
Transport-layer middleboxes which keep flow-associated state through
the duration of the flow typically keep, at a minimum, the standard
IP 5-tuple:
{s_addr, d_addr, s_port, d_port, protocol}
where
s_addr is the source address
d_addr is the destination address
s_port is the source port
d_port is the destination port
protocol is the IP protocol (TCP, UDP, SCTP, RSVP, etc.)
Other data elements often associated with a network flow include
timers.
Middlebox state is almost always associated with a specific interface
(rather than the interface being an attribute of the flow). Some
"stateful inspection" firewalls may keep state from higher layers in
the networking stack: everything from TCP sequence numbers to entire
SIP dialogues.
Note that the state associated with a flow may be left up when the
flow is torn down in some implementations, such as those NATs that
put the state on an activity-based timer as an efficiency mechanism,
to avoid reinstantiating state should a new flow be created which
shares the attributes of the flow which just ended. This is often
the case with HTTP, for example.
Gu, et al. Expires January 4, 2013 [Page 6]
Internet-Draft Middlebox State Migration July 2012
4.2. State vs policy
We would like to draw a clear distinction between state and policy.
'Policy' is a set of statements that define how traffic (in this
case) is to be treated by the middlebox. In some sense policy is a
description of what state should be applied to a network flow; that
is to say, state includes the instantiation of policy. When a flow
first arrives at a middlebox, it consults its policy to determine
what state (if any) is to be created and then associated with that
flow
As a general rule of thumb, policy is provisioned while state
represents run-time responses to environmental conditions (in this
case, network flows). Because policy is provisioned and because we
assume that the middleboxes between which state would be migrated are
under the administrative control of the same organization, we will
make another assumption that there is consistent policy configured
across middleboxes. We are aware that this is not always a correct
assumption.
4.3. Mechanisms for instantiating middlebox state
State is created on middleboxes using a small number of mechanisms,
sometimes in combination.
The most common means by which middlebox state is created is that the
middlebox examines traffic and compares it against its own policies,
which have typically been configured or provisioned by a systems or
network administrator but in very simple cases can come
preprovisioned, for example on commodity consumer equipment. It then
creates middlebox state, in the form of a firewall pinhole, a NAT
table mapping, QoS table entry, etc.
Another means is through explicit request. An endpoint or its proxy
sends a request for resources (again, firewall pinhole, NAT table
mapping, and so on) to the middlebox using some sort of "signaling"
protocol to request the resource. The middlebox compares the request
to its policy and grants or denies the request based on that policy.
Examples of explicit request include RSVP [RFC2205], midcom
[RFC3303], TURN [RFC5766], and the work being done by the IETF
pcp [1] working group.
It is worth mention that there are mechanisms that are essentially
hybrids of the previous two approaches, using expected effects of
sending traffic across a middlebox to trigger hoped-for state
instantiation. STUN [RFC5389] is probably the best-known example of
this.
Gu, et al. Expires January 4, 2013 [Page 7]
Internet-Draft Middlebox State Migration July 2012
5. "Moving" endpoints
Moving an endpoint, in the context of this internet draft, refers to
changing its point of attachment to a network. Doing so may cause
traffic to cross different middleboxes from the ones the traffic
traversed when the middlebox state was created.
5.1. A few words about addresses
One question that comes up from time to time in discussions of VM
migration is whether or not the IP address will change as a result of
the migration. We believe that this is out of scope for the time
being, not the least because host operating system support is
potentially difficult. If our goal is to keep a given network flow
up and alive during a migration, not only would the endpoint
operating system need to be aware that its address has changed, it
would also need to to be able to signal the other end of the flow,
which would have to respond by modifying open sockets' sockaddrs,
etc. There are also some obvious security problems that would need
to be addressed.
5.2. Scenarios
In this section we introduce a few scenarios. We believe the
architecture is fundamentally the same in these scenarios and that
what we're describing is a general problem.
5.2.1. Virtual machine migration
The live migration (i.e. the VM appears to remain "up" and available
during the migration) of virtual machines between hypervisors in the
same data center has been established practice for several years now,
but there's been a move towards live migration of VMs between
geographically disparate data centers (see, for example this
collaboration [2] between Cisco and VMWare). This provides high
availability, the ability to perform data center maintenance without
downtime, data center migration or consolidation, data center
expansion, and workload balancing. There is a compelling use case
for VM migration.
However, reliance on proprietary tunneling and signaling protocols
leads to vendor lock-in, lack of interoperability between products
from different vendors, and, of course, lack of openness that can
mask architectural and security flaws.
Gu, et al. Expires January 4, 2013 [Page 8]
Internet-Draft Middlebox State Migration July 2012
5.2.2. SCTP NAT
The SCTP [RFC4960] protocol supports multihomed endpoints. Any NAT
that is port-aware (and these days it is nearly all of them) will
need to have SCTP support in order to be able to handle extracting
the port numbers even for flows that are single-homed on each end.
This provides a mechanism for transparent failover when one path
taken by the network flow fails (see section 6.4 in [RFC4960]
The upshot of this is that if a NAT is maintaining state related to a
flow on the primary path and the primary path fails, that state may
need to be transferred to the NAT being traversed by the secondary
path.
This problem is being addressed in the IETF behave [3] working group.
5.2.3. High availability, and failover
"High-availability" commonly suggests failover as a mechanism to
guarantee uninterrupted (or minimally interrupted) services. When a
failure is detected services are shifted onto a secondary server.
Note that this shift can be implemented through VM migration, as well
as having the services brought up on a new system image.
Because outages are sometimes caused by site failures, failover can
take place across geographically disparate sites. This introduces
the likelihood of the flow now traversing a very different network
path and a new set of middleboxes.
Gu, et al. Expires January 4, 2013 [Page 9]
Internet-Draft Middlebox State Migration July 2012
6. "Directionality"
One of the questions that comes up when considering an overall
architecture to solve this set of problems is who initiates the state
migration and how the data "flow" from place to place.
One approach is to have the middleboxes communicate directly with
each other. In this case having all middleboxes poll all other
middleboxes for copies of their state seems wasteful and inefficient,
suggesting that communication between middleboxes would need a
specific trigger. The "old" middlebox could send its state to the
"new" middlebox or the new middlebox could send a request to the old
middlebox for a copy of its state. In either case one middlebox
would need to know the location of the other and be able to
communicate with it (both parties would need to authenticate to each
other). Note that if a catastrophic network event caused the old
middlebox to become unreachable, it would be impossible to
successfully query it for its state. [Note that this approach was
considered for SCTP NAT traversal and discarded as impossible, since
there was no way for one NAT to know about other NATs.]
Another approach is to have some controlling entity involved, either
mediating communication between middleboxes or directing
communication between middleboxes. In a VM migration scenario, a VM
manager, or a network manager communicating with a VM manager, is an
obvious candidate.
The orthogonal question to whether or not there's a mediating entity
is who initiates the communication - does the old middlebox respond
to a catastrophic event by dumping state before shutting down (not
always possible, obviously) or is it polled by a mediating device or
a new middlebox? Another possibility is to periodically transfer
incremental information so that a non-recoverable error can save most
of the flows, if not all.
Gu, et al. Expires January 4, 2013 [Page 10]
Internet-Draft Middlebox State Migration July 2012
7. Problems
The problems that must be solved in order to move middlebox state
along with a moving endpoint include:
o Recognizing when an endpoint has moved
o Locating middleboxes along the original path
o Locating middleboxes along the new path
o Getting a copy of state from middleboxes along the old path
o Installing that state in middleboxes along the new path
7.1. Recognizing when an endpoint has moved
As touched upon in Section 5.2, there are various circumstances that
could cause an endpoint to change its point of attachment to a
network. They fall into two broad categories: planned and unplanned.
In the planned case, some entity knows that an endpoint is about to
move and the move can happen in a controlled fashion. There may be
time to send network queries, learn topology, and gather state.
The unplanned case is typically a response to the failure of some
element in the network. A monitoring heartbeat is missed, a
connection times out, or some other indication of catastraphic
failure is received by an endpoint or by a monitoring service. Not
only does this interfere with the notion of an organized transfer
from one path to the new one, it also means that there may be cases
where the old middlebox is not reachable and it's not possible to
query its state.
7.2. Topology discovery
Somehow or other the state migration mechanism needs to be able to
locate and communicate with both the middleboxes on the old path and
the middleboxes on the new path. This is not a trivial problem; IP
was designed to have the network itself be largely opaque to
endpoints, and very often systems and network administrators prefer
not to expose network topology, feeling that it would introduce
security threats.
There are several options, including configuration, discovery, and
notification. In configuration, someone with knowledge of the
network topology would be able to construct a table describing
middleboxes associated with certain routes. In discovery, a network
Gu, et al. Expires January 4, 2013 [Page 11]
Internet-Draft Middlebox State Migration July 2012
mechanism would be used to query for the middleboxes along a path,
similar to traceroute or to a PATH message in RSVP [RFC2205].
A configuration mechanism would have the disadvantage of being not
particularly responsive to changes in the network, as well as being
somewhat error-prone. However, it would not involve inventing a new
network mechanism or requiring changes on every participating
middlebox (although the state migration mechanism itself would nearly
certainly require changes).
[Note that an architecture that had the middlebox copying its own
state out to some third party would almost certainly have to be
configuration-base.]
A discovery-based approach would require putting new software on
every middlebox, an approach that is intuitively unappealing and that
has been repeatedly shown to inhibit adoption of newer technologies.
There is no such thing as incremental deployment using this approach.
It also introduces security problems, since without the appropriate
protections it would allow attackers to probe and discover not just
network topology but specifically the location of security devices/
middleboxes in a given network. On the other hand it's robust
against configuration errors and highly responsive to changes in the
underlying network.
A third option, notification, relies on a middlebox announcing its
presence to the network, typically using anycast or broadcast. This
also requires changes to both the middlebox and a controlling entity,
and a an announcement/notification protocol. It has the advantage of
being responsive to new middleboxes coming up in the network,
although a mechanism (such as a heartbeat) would be needed to detect
outages and drops.
The primary security consideration in a notification scenario is that
the network must be tightly controlled to prevent announcements from
being eavesdropped upon by adversaries.
7.3. Copying state from a middlebox
Another problem to be solved is the one of copying state from a
middlebox, encoding it, and transferring it over the network.
It may be the case that the middleboxes are from different
manufacturers/vendors, and so the problem of representing the state
we wish to transfer includes the question of presenting it in a
vendor-neutral format, including both state semantics and state
syntax.
Gu, et al. Expires January 4, 2013 [Page 12]
Internet-Draft Middlebox State Migration July 2012
A somewhat more challenging aspect of this problem is how to
transport the encoded state. For one thing, it may be that the event
that triggered the endpoint migration has also rendered the middlebox
in question unreachable. For another, what sort of load this imposes
on the middlebox depends, among other things, on the "directionality"
of the state migration. It may be that an external device, such as a
session controller, a hypervisor, or another middlebox queries the
old middlebox for a copy of its state. In high-availability
scenarios the middlebox may end up "pushing" copies of its state out
to some controlling or intermediate entity, such as a hypervisor.
7.4. Installing state on the new middlebox
The problem of installing state on the new middlebox is closely
related to the one of copying state from the old middlebox. In both
cases we're facing the problems of representation and encoding, a
transport protocol to/from the middlebox, and questions about
reachability.
Gu, et al. Expires January 4, 2013 [Page 13]
Internet-Draft Middlebox State Migration July 2012
8. Security Considerations
Any time we introduce new mechanisms to query and manipulate
middleboxes, we also introduce potentially very serious security
exposures.
In this case, because we're planning on discovering the location of
middleboxes, querying the middleboxes for their state, and installing
state on middleboxes, we face a very broad range indeed of potential
threats.
Network and systems administrators typically want to conceal network
topology from outsiders, and it may be necessary to use authenticated
discovery (packet filtering may be adequate for some deployments but
not all). This introduces problems around credentials management and
keying for participants, and may suggest that we would want to
minimize the number of network elements talking with one another.
Cleary the ability to copy data from a middlebox introduces the
ability to discovery yet more network topology, and in particular to
identify specific firewall pinholes and NAT table mappings, and their
associated state.
Similarly, the ability to install state on a middlebox can introduce
both Denial of Service (DoS) vulnerabilities but also the ability of
an attacker to penetrate a middlebox, or to disable it completely.
In all cases, protections must be designed with sensitivity to
performance, since middleboxes often are processing very heavy
traffic loads. This means keeping an eye on cryptographic processing
demands, key and other credentials management, etc.
Gu, et al. Expires January 4, 2013 [Page 14]
Internet-Draft Middlebox State Migration July 2012
9. IANA Considerations
This document has no actions for IANA.
Gu, et al. Expires January 4, 2013 [Page 15]
Internet-Draft Middlebox State Migration July 2012
10. Informative References
[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
Functional Specification", RFC 2205, September 1997.
[RFC2722] Brownlee, N., Mills, C., and G. Ruth, "Traffic Flow
Measurement: Architecture", RFC 2722, October 1999.
[RFC3234] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and
Issues", RFC 3234, February 2002.
[RFC3303] Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A., and
A. Rayhan, "Middlebox communication architecture and
framework", RFC 3303, August 2002.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol",
RFC 4960, September 2007.
[RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
"Session Traversal Utilities for NAT (STUN)", RFC 5389,
October 2008.
[RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using
Relays around NAT (TURN): Relay Extensions to Session
Traversal Utilities for NAT (STUN)", RFC 5766, April 2010.
[RFC5944] Perkins, C., "IP Mobility Support for IPv4, Revised",
RFC 5944, November 2010.
[1]
[2]
[3]
Gu, et al. Expires January 4, 2013 [Page 16]
Internet-Draft Middlebox State Migration July 2012
Authors' Addresses
Yingjie Gu
Huawei
Phone: +86-25-56624760
Fax: +86-25-56624702
Email: guyingjie@huawei.com
Melinda Shore
No Mountain Software
PO Box 16271
Two Rivers, AK 99716
US
Phone: +1 907 322 9522
Email: melinda.shore@nomountain.net
Senthil Sivakumar
Cisco Systems
7100-8 Kit Creek Road
Research Triangle Park, NC
US
Email: ssenthil@cisco.com
Gu, et al. Expires January 4, 2013 [Page 17]