Internet Research Task Force F. Templin, Ed. (IRTF) Boeing Research & Technology Internet-Draft August 26, 2010 Intended status: Experimental Expires: February 27, 2011 The Internet Routing Overlay Network (IRON) draft-templin-iron-11.txt Abstract Since the Internet must continue to support escalating growth due to increasing demand, it is clear that current routing architectures and operational practices must be updated. This document proposes an Internet Routing Overlay Network (IRON) that supports sustainable growth through Provider Independent addressing while requiring no changes to end systems and no changes to the existing routing system. IRON further addresses other important issues including routing scaling, mobility management, multihoming, traffic engineering and NAT traversal. While business considerations are an important determining factor for widespread adoption, they are out of scope for this document. This document is a product of the IRTF Routing Research Group. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on February 27, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Templin Expires February 27, 2011 [Page 1] Internet-Draft IRON August 2010 Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Templin Expires February 27, 2011 [Page 2] Internet-Draft IRON August 2010 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. The Internet Routing Overlay Network . . . . . . . . . . . . . 6 3.1. IR[CE] - IRON Customer Edge Router . . . . . . . . . . . . 8 3.2. IR[VE] - IRON Virtual Prefix Company Edge Router . . . . . 8 3.3. IR[VC] - IRON Virtual Prefix Company Core Router . . . . . 9 3.4. IR[VP] - IRON Virtual Prefix Company Combined Router . . . 10 4. IRON Organizational Principles . . . . . . . . . . . . . . . . 11 5. IRON Initialization . . . . . . . . . . . . . . . . . . . . . 12 5.1. IR[VC] Initialization . . . . . . . . . . . . . . . . . . 13 5.2. IR[VE] Initialization . . . . . . . . . . . . . . . . . . 13 5.3. IR[CE] Initialization . . . . . . . . . . . . . . . . . . 14 6. IRON Operation . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1. IR[CE] Operation . . . . . . . . . . . . . . . . . . . . . 15 6.2. IR[VE] Operation . . . . . . . . . . . . . . . . . . . . . 17 6.3. IR(VC) Operation . . . . . . . . . . . . . . . . . . . . . 18 6.4. IRON Reference Operating Scenarios . . . . . . . . . . . . 19 6.4.1. Both Hosts Within IRON EUNs . . . . . . . . . . . . . 19 6.4.2. Mixed IRON and Non-IRON Hosts . . . . . . . . . . . . 22 6.5. Mobility, Multihoming and Traffic Engineering Considerations . . . . . . . . . . . . . . . . . . . . . . 25 6.5.1. Mobility Management . . . . . . . . . . . . . . . . . 25 6.5.2. Multihoming . . . . . . . . . . . . . . . . . . . . . 26 6.5.3. Inbound Traffic Engineering . . . . . . . . . . . . . 26 6.5.4. Outbound Traffic Engineering . . . . . . . . . . . . . 26 6.6. Renumbering Considerations . . . . . . . . . . . . . . . . 26 6.7. NAT Traversal Considerations . . . . . . . . . . . . . . . 27 6.8. Nested EUN Considerations . . . . . . . . . . . . . . . . 27 6.8.1. Host A Sends Packets to Host Z . . . . . . . . . . . . 28 6.8.2. Host Z Sends Packets to Host A . . . . . . . . . . . . 29 7. Additional Considerations . . . . . . . . . . . . . . . . . . 30 8. Related Initiatives . . . . . . . . . . . . . . . . . . . . . 30 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 10. Security Considerations . . . . . . . . . . . . . . . . . . . 31 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 12.1. Normative References . . . . . . . . . . . . . . . . . . . 31 12.2. Informative References . . . . . . . . . . . . . . . . . . 31 Appendix A. IRON VPs Over Internetworks with Different Address Families . . . . . . . . . . . . . . . . . . 34 Appendix B. Scaling Considerations . . . . . . . . . . . . . . . 34 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 35 Templin Expires February 27, 2011 [Page 3] Internet-Draft IRON August 2010 1. Introduction Growth in the number of entries instantiated in the Internet routing system has led to concerns for unsustainable routing scaling [I-D.narten-radir-problem-statement]. Operational practices such as increased use of multihoming with IPv4 Provider-Independent (PI) addressing are resulting in more and more fine-grained prefixes injected into the routing system from more and more end-user networks. Furthermore, the forthcoming depletion of the public IPv4 address space has raised concerns for both increased deaggregation (leading to yet further routing table entries) and an impending address space run-out scenario. At the same time, the IPv6 routing system is beginning to see growth in IPv6 Provider-Aggregated (PA) prefixes [BGPMON] which must be managed in order to avoid the same routing scaling issues the IPv4 Internet now faces. Since the Internet must continue to scale to accommodate increasing demand, it is clear that new routing methodologies and operational practices are needed. Several related works have investigated routing scaling issues. Virtual Aggregation (VA) [I-D.ietf-grow-va] and Aggregation in Increasing Scopes (AIS) [I-D.zhang-evolution] are global routing proposals that introduce routing overlays with Virtual Prefixes (VPs) to reduce the number of entries required in each router's Forwarding Information Base (FIB) and Routing Information Base (RIB). Routing and Addressing in Networks with Global Enterprise Recursion (RANGER) [RFC5720] examines recursive arrangements of enterprise networks that can apply to a very broad set of use case scenarios [I-D.russert-rangers]. In particular, RANGER supports encapsulation and secure redirection by treating each layer in the recursive hierarchy as a virtual non-broadcast, multiple access (NBMA) "link". RANGER is an architectural framework that includes Virtual Enterprise Traversal (VET) [I-D.templin-intarea-vet] and the Subnetwork Adaptation and Encapsulation Layer (SEAL) [I-D.templin-intarea-seal] as its functional building blocks. This document proposes an Internet Routing Overlay Network (IRON) with goals of supporting sustainable growth while requiring no changes to the existing routing system. IRON borrows concepts from VA, AIS and RANGER, and further borrows concepts from the Internet Vastly Improved Plumbing (Ivip) [I-D.whittle-ivip-arch] architecture proposal along with its associated Translating Tunnel Router (TTR) mobility extensions [TTRMOB]. Indeed, the TTR model to a great degree inspired the IRON mobility architecture design discussed in this document. The Network Address Translator (NAT) traversal techniques adapted for IRON were inspired by the Simple Address Mapping for Premises Legacy Equipment (SAMPLE) proposal [I-D.carpenter-softwire-sample]. Templin Expires February 27, 2011 [Page 4] Internet-Draft IRON August 2010 IRON specifically seeks to provide scalable PI addressing without changing the current BGP [RFC4271] routing system. IRON observes the Internet Protocol standards [RFC0791][RFC2460]. Other network layer protocols that can be encapsulated within IP packets (e.g., OSI/CLNP [RFC1070], etc.) are also within scope. The IRON is a global routing system comprising virtual overlay networks managed by Virtual Prefix Companies (VPCs) that own and manage Virtual Prefixes (VPs) from which End User Network (EUN) PI prefixes (EPs) are delegated to customer sites. The IRON is motivated by a growing customer demand for multihoming, mobility management and traffic engineering while using stable PI addressing to avoid network renumbering [RFC4192][RFC5887]. The IRON uses the existing IPv4 and IPv6 global Internet routing systems as virtual links for tunneling inner network protocol packets within outer IPv4 or IPv6 headers (see: Section 3). The IRON requires deployment of a small number of new BGP core routers and supporting servers, as well as IRON-aware routers/servers in customer EUNs. No modifications to hosts, and no modifications to most routers are required. Note: This document is offered in compliance with Internet Research Task Force (IRTF) document stream procedures [RFC5743]; it is not an IETF product and is not a standard. The views in this document were considered controversial by the IRTF Routing Research Group (RRG) but the RG reached a consensus that the document should still be published. The document will undergo a period of review within the RRG and through selected expert reviewers prior to publication. The following sections discuss details of the IRON architecture. 2. Terminology This document makes use of the following terms: End User Network (EUN) an edge network that connects an organization's devices (e.g., computers, routers, printers, etc.) to the Internet and possibly also the IRON. Internet Service Provider (ISP) a service provider which physically connects customer EUNs to the Internet. In other words, an ISP is responsible for providing IP connectivity to a customer owning an EUN. Provider Aggregated (PA) address or prefix a network layer address or prefix delegated to an EUN by an ISP. Templin Expires February 27, 2011 [Page 5] Internet-Draft IRON August 2010 Provider Independent (PI) address or prefix a network layer address or prefix delegated to an EUN by a third party independently of the EUN's ISP arrangements. Virtual Prefix (VP) a PI prefix block (e.g., an IPv4 /16, an IPv6 /20, an OSI NSAP prefix, etc.) that is owned and managed by a Virtual Prefix Company (VPC). End User Network PI prefix (EP) a more-specific PI prefix derived from a VP (e.g., an IPv4 /28, an IPv6 /56, etc.) and delegated to an EUN by a VPC. EP Address (EPA) a network layer address belonging to an EP and assigned to the interface of an end system in an EUN. Locator an IP address assigned to the interface of a router or end system within a public or private network. Locators taken from public IP prefixes are routable on a global basis, while locators taken from private IP prefixes are made public via Network Address Translation (NAT). Virtual Prefix Company (VPC) a company that owns and manages a set of VPs from which it delegates End User Network PI Prefixes (EPs) to EUNs Internet Routing Overlay Network (IRON) an overlay network configured over the global Internet. The IRON supports routing through encapsulation of inner packets with EPA addresses within outer headers that use locator addresses. implicit anycast an anycast discovery procedure whereby a customer router discovers provider routers that are topologically nearby. Also a means by which a router on the path to a tunnel egress makes its presence known by sending a redirect informing the tunnel ingress of a better route. 3. The Internet Routing Overlay Network The Internet Routing Overlay Network (IRON) consists of IRON Routers (IRs) that automatically tunnel the packets of end-to-end communication sessions within encapsulating headers used for Internetwork routing. IRs use Virtual Enterprise Traversal (VET) [I-D.templin-intarea-vet] in conjunction with the Subnetwork Templin Expires February 27, 2011 [Page 6] Internet-Draft IRON August 2010 Encapsulation and Adaptation Layer (SEAL) [I-D.templin-intarea-seal] to encapsulate inner network layer packets within outer headers as shown in Figure 1: +-------------------------+ | Outer headers with | ~ locator addresses ~ | (IPv4 or IPv6) | +-------------------------+ | SEAL Header | +-------------------------+ +-------------------------+ | Inner Packet Header | --> | Inner Packet Header | ~ with EP addresses ~ --> ~ with EP addresses ~ | (IPv4, IPv6, OSI, etc.) | --> | (IPv4, IPv6, OSI, etc.) | +-------------------------+ +-------------------------+ | | --> | | ~ Inner Packet Body ~ --> ~ Inner Packet Body ~ | | --> | | +-------------------------+ +-------------------------+ Inner packet before Outer packet after before encapsulation after encapsulation Figure 1: Encapsulation of Inner Packets Within Outer IP Headers VET specifies the automatic tunneling mechanisms used for encapsulation, while SEAL specifies the format and usage of the SEAL header as well as a set of control messages. Most notably, IRs use the SEAL Control Message Protocol (SCMP) to deterministically exchange and authenticate control messages such as route redirections, indications of Path Maximum Transmission Unit (PMTU) limitations, destination unreachables, etc. The IRON is manifested through a business model in which Virtual Prefix Companies (VPCs) own and manage virtual overlay networks comprising a set of IRs that are distributed throughout the Internet and serve highly-aggregated Virtual Prefixes (VPs). VPCs delegate sub-prefixes from their VPs which they lease to customers as End User Network PI prefixes (EPs). The customers in turn assign the EPs to their customer edge IRs which connect their End User Networks (EUNs) to the IRON. VPCs may have no affiliation with the ISP networks from which customers obtain their basic Internet connectivity. Therefore, unless the ISP also acts as a VPC the customer must have two business relationships - one with the ISP and a second with the VPC. In that case, the VPC can open for business and begin serving their customers immediately without the need to coordinate their activities with ISPs Templin Expires February 27, 2011 [Page 7] Internet-Draft IRON August 2010 or with other VPCs. Further details on business considerations are out of scope for this document. The IRON requires no changes to end systems and no changes to most routers in the Internet. Instead, the IRON comprises IRs that are deployed either as new platforms or as modifications to existing platforms. IRs may be deployed incrementally without disturbing the existing Internet routing system, and act as waypoints (or "cairns") for navigating the IRON. The functional roles for IRs are described in the following sections. 3.1. IR[CE] - IRON Customer Edge Router An IR[CE] is a Customer Edge router (or host with embedded gateway function) that logically connects the customer's EUNs and their associated EPs to the IRON via tunnels as shown in Figure 2. IR[CE]s obtain EPs from VPCs and use them to number subnets and interfaces within their EUNs. An IR[CE] can be deployed on the same physical platform that also connects the customer's EUNs to its ISPs, but it may also be a separate router or even a standalone server system located within the EUN. (This model applies even if the EUN connects to the ISP via a Network Address Translator (NAT) - see Section 6.7). .-. ,-( _)-. +--------+ .-(_ (_ )-. | IR[CE] |--(_ ISP ) +---+----+ `-(______)-' | <= T \ .-. .-. u \ ,-( _)-. ,-( _)-. n .-(_ (- )-. .-(_ (_ )-. n (_ Internet ) (_ EUN ) e `-(______)- `-(______)-' l ___ | s => (:::)-. +----+---+ .-(::::::::) | Host | .-(::::::::::::)-. +--------+ (:::: The IRON ::::) `-(::::::::::::)-' `-(::::::)-' Figure 2: IR[CE] Connecting EUN to the IRON 3.2. IR[VE] - IRON Virtual Prefix Company Edge Router An IR[VE] is a VPC's overlay network edge router that provides forwarding and mapping services for the EPs owned by customer IR[CE]s. In typical deployments, a VPC will deploy many IR[VE]s around the IRON in a globally-distributed fashion (e.g., as depicted Templin Expires February 27, 2011 [Page 8] Internet-Draft IRON August 2010 in Figure 3) so that IR[CE] clients can discover those that are nearby. +--------+ +--------+ | IR[VE] | | IR[VE] | | Boston | | Tokyo | +--+-----+ ++-------+ +--------+ \ / | IR[VE] | \ ___ / | Seattle| \ (:::)-. +--------+ +------+-+ .-(::::::::)------+ IR[VE] | \.-(::::::::::::)-. | Paris | (:::: The IRON ::::) +--------+ `-(::::::::::::)-' +--------+ / `-(::::::)-' \ +--------+ | IR[VE] + | \--- + IR[VE] | | Moscow | +----+---+ | Sydney | +--------+ | IR[VE] | +--------+ | Cairo | +--------+ Figure 3: IR[VE] Global Distribution Example Each IR[VE] serves as a customer-facing tunnel endpoint router that IR[CE]s form bidirectional tunnels with over the IRON. Each IR[VE] also associates with an Internet-facing IR[VC] that can forward packets from the IRON out to the native public Internet and vice- versa as discussed in the next section. 3.3. IR[VC] - IRON Virtual Prefix Company Core Router An IR[VC] is a VPC's overlay network core router that acts as a gateway between the IRON and the native public Internet. It therefore also serves as an Autonomous System Border Router (ASBR) that is owned and managed by the VPC. Each VPC configures one or more IR[VC]s which advertise the company's VPs into the IPv4 and IPv6 global Internet BGP routing systems. Each IR[VC] associates with all of the VPC's overlay network IR[VE] routers, e.g., via tunnels over the IRON, via a direct interconnect such as an Ethernet cable, etc. The IR[VC] role (as well as its relationship with overlay network IR[VE]s) is depicted in Figure 4: Templin Expires February 27, 2011 [Page 9] Internet-Draft IRON August 2010 ,-( _)-. .-(_ (_ )-. (_ Internet ) `-(______)-' | +--------+ | |--| IR[VE] | +----+---+ | +--------+ | IR[VC] |----| +--------+ +--------+ |--| IR[VE] | _|| | +--------+ (:::)-. (Ethernet) .-(::::::::) +--------+ .-(::::::::::::)-. +--------+ | IR[VE] |=(:::: The IRON ::::)=| IR[VE] | +--------+ `-(::::::::::::)-' +--------+ `-(::::::)-' || (Tunnels) +--------+ | IR[VE] | +--------+ Figure 4: IR[VC] Connecting IRON to Native Internet 3.4. IR[VP] - IRON Virtual Prefix Company Combined Router An IR[VP] is a VPC's overlay network router that combines the functions of both the IR[VE] and IR[VC]. While not in itself a fundamental building block of the architecture, it is mentioned here to clarify an implementation option available to VPCs. In the IR[VP] model, the IR[VE] and IR[VC] functions can be thought of as "half-gateway" functions that together comprise a unified IR[VP]. The IR[VE] and IR[VC] functions can therefore be discussed separately even when both functions reside within the same physical IR[VP] platform as shown in Figure 5: Templin Expires February 27, 2011 [Page 10] Internet-Draft IRON August 2010 ,-( _)-. .-(_ (_ )-. (_ Internet ) `-(______)-' | +----------+----------+ | IR[VC] half-gateway | +---------------------+ | IR[VE] half-gateway | +----------+----------+ <- IR[VP] Unified Gateway -> _|_ (:::)-. .-(::::::::) .-(::::::::::::)-. (:::: The IRON ::::) `-(::::::::::::)-' `-(::::::)-' Figure 5: IR[VP] Combining IR[VE] and IR[VC] Functions 4. IRON Organizational Principles The IRON consists of the union of all VPC overlay networks worldwide (where each VPC configures one or more overlay networks). Each such overlay network represents a distinct "patch" on the Internet "quilt", where the patches are stitched together by tunnels over the links, routers, bridges, etc. that connect the public Internet. When a new VPC overlay network is deployed, it becomes yet another patch on the quilt. The IRON is therefore a composite overlay network consisting of multiple individual patches, where each patch coordinates its activities independently of all others (with the exception that the IR[VE]s of each patch must be aware of all VP's in the IRON). Each VPC overlay network in the IRON maintains a set of IR[VC]s that connect the overlay network directly to the public IPv4 and IPv6 Internets. Each IR[VC] advertises the VPC overlay network's IPv4 VPs into the IPv4 BGP routing system and advertises the overlay network's IPv6 VPs into the IPv6 BGP routing system. IR[VC]s will therefore receive packets with EPA destination addresses sent by end systems in the Internet and direct them toward EPA-addressed end systems connected to the VPC overlay network. Each VPC overlay network also manages a set of IR[VE]s that connect customer EUNs to the IRON and to the IPv6 and IPv4 Internets via their associations with IR[VC]s. IR[VE]s therefore need not be BGP Templin Expires February 27, 2011 [Page 11] Internet-Draft IRON August 2010 routers themselves and can be simple commodity hardware platforms. Moreover, the IR[VE] and IR[VC] functions can be deployed together on the same physical platform as an IR[VP] or they may be deployed on separate platforms (e.g., for load balancing purposes). Each IR[VE] maintains a working set of IR[CE]s for which it caches EP-to-IR[CE] mappings in its Forwarding Information Base (FIB). Each IR[VE] also in turn propagates the list of EPs in its working set to each of the IR[VC]s in the VPC overlay network via a dynamic routing protocol (e.g., an overlay network internal BGP instance that carries only the EP-to-IR[VE] mappings and does not interact with the external BGP routing system). Each IR[VE] therefore only needs to track the EPs for its current working set of IR[CE]s, while each IR[VC] will maintain a full EP-to-IR[VE] mapping table that represents reachability information for all EPs in the VPC overlay network. Customers establish IR[CE]s to connect their EUNs to both the VPC overlay network and to the rest of the IRON. Each EUN can connect to the IRON via one or multiple IR[CE]s as long as the multiple IR[CE]s coordinate with one another, e.g., to mitigate EUN partitions. Unlike IR[VC]s and IR[VE]s, IR[CE]s may use private addresses behind one or several layers of NATs. The IR[CE] initially discovers a list of nearby IR[VE]s through an "implicit anycast" discovery process (described below). It then selects one of these nearby IR[VE]s as its server and forms a bidirectional tunnel with the IR[VE] through an initial exchange followed by periodic keepalives. After the IR[CE] selects a serving IR[VE], it forwards initial outbound packets from its EUNs by tunneling them to its own serving IR[VE] which in turn forwards them to the nearest IR[VC] within the IRON that serves the final destination. The IR[CE] will subsequently receive redirect messages informing it of a more direct route through the IR[VE] that serves the final destination. The IRON can also be used to support VPs of network layer address families that cannot be routed natively in the underlying Internetwork (e.g., OSI/CLNP over the public Internet, IPv6 over IPv4-only Internetworks, IPv4 over IPv6-only Internetworks, etc.). Further details for support of IRON VPs over non-native Internetworks are discussed in Appendix A. 5. IRON Initialization IRON initialization entails the startup actions of IRs within the VPC overlay network and customer EUNs. The following sections discuss these startups procedures. Templin Expires February 27, 2011 [Page 12] Internet-Draft IRON August 2010 5.1. IR[VC] Initialization Before its first operational use, each IR[VC] in a VPC overlay network is provisioned with the list of VPs that it will serve as well as the locators for all IR[VE]s that belong to the same overlay network. The IR[VC] is also provisioned with external BGP interconnections the same as for any BGP router. Upon startup, the IR[VC] engages in BGP routing exchanges with its peers in the IPv4 and IPv6 Internets the same as for any BGP router. It then connects to all of the IR[VE]s in the overlay network (e.g., via a TCP connection over a bidirectional tunnel, via an iBGP route reflector, etc.) for the purpose of discovering EP->IR[VE] mappings. After the IR[VC] has fully populated its EP->IR[VE] mapping information database, it is said to be "synchronized" wrt its VPs. After this initial synchronization procedure, the IR[VC] then advertises the overlay network's VPs externally. In particular, the IR[VC] advertises the IPv6 VPs into the IPv6 BGP routing system and advertises the IPv4 VPs into the IPv4 BGP routing system. If the IR[VC] only services IPv6 VPs (e.g., 2001:DB8::/32), it advertises the IPv6 VPs into the IPv6 routing system and also advertises a companion IPv4 prefix (e.g., 192.0.2.0/24) into the IPv4 routing system that can be used by IR[CE]s/IR[VE]s from other VPC overlay networks for implicit anycast discovery purposes. Similarly, if the IR[VC] only services IPv4 VPs, it also advertises a companion IPv6 prefix (e.g., 2001:DB8::/56) into the IPv6 routing system. (See Appendix A for more information on the discovery and use of companion prefixes.) The IR[VC] then engages in ordinary packet forwarding operations. 5.2. IR[VE] Initialization Before its first operational use, each IR[VE] in a VPC overlay network is provisioned with the locators for all IR[VC]s that serve the overlay network's VPs. In order to support route optimization, the IR[VE] must also be provisioned with the list of all VPs in the IRON (i.e., and not just the VPs of its own overlay network) so that it can discern EPA and non-EPA addresses. (The IR[VE] could therefore be greatly simplified if the list of VPs could be covered within a small number of very short prefixes, e.g., one or a few IPv6 ::/20's) The IR[VE] should also discover the VP companion prefix relationships discussed in Section 5.1, e.g., via a global database such as discussed in Appendix A. Upon startup, each IR[VE] must connect to all of the IR[VC]s within its overlay network (e.g., via a TCP connection over a bidirectional tunnel, via an iBGP route reflector, etc.) for the purpose of Templin Expires February 27, 2011 [Page 13] Internet-Draft IRON August 2010 reporting its EP->IR[VE] mappings. The IR[VE] then actively listens for IR[CE] customers which register their EP prefixes as part of establishing a bidirectional tunnel. When a new IR[CE] registers its EP prefixes, the IR[VE] announces the new EP additions to all IR[VC]s; when an existing IR[CE] unregisters its EP prefixes, the IR[VE] withdraws its announcements. 5.3. IR[CE] Initialization Before its first operational use, each IR[CE] must obtain one or more EPs from its VPC as well as any companion prefixes of other address families (see Section 5.1) associated with the EPs. The IR[CE] must also obtain a certificate and a public/private key pair from the VPC that it can later use to prove ownership of its EPs. This implies that each VPC must run its own key infrastructure to be used only for the purpose of verifying a customer's claimed right to use an EP. Hence, the VPC need not coordinate its key infrastructure with any other organization. Upon startup, the IR[CE] sends a SEAL Control Message Protocol (SCMP) Router Solicitation (SRS) message using an implicit anycast procedure to discover the nearest IR[VC] in its VPC overlay network. The IR[VC] will in turn return a list of locators of the company's nearby IR[VE]s. (This list is analogous to the ISATAP Potential Router List (PRL) [RFC5214].)I To perform the implicit anycast procedure, the IR[CE] sets the source address of the SRS message to one of its locator addresses and sets the destination address of the message to any EPA taken from one of its own EPs. (If the EP is of a different address family than the IR[CE]'s locators, however, the IR[CE] instead sets the destination address to any address taken from the companion prefix associated with the EP.) This SRS message will be delivered to the nearest IR[VC] that attaches the VPC overlay network to the Internet. When the IR[VC] receives the SRS message, it sends back an SCMP Router Advertisement (SRA) message that lists the locator addresses of one or more nearby IR[VE] routers. After the IR[CE] receives an SRA message from the nearby IR[VC] listing the locator addresses of nearby IR[VE]s, it sends SRS test messages to one or more of the locator addresses to elicit SRA messages. The IR[VE] that configures the locator will include the header of the soliciting SRS message in its SRA message so that the IR[CE] can determine the number of hops along the forward path. The IR[VE] also includes a metric in its SRA messages indicating its service availability so that the IR[CE] can avoid selecting IR[VE]s that are overloaded. The IR[VE] also includes a challenge/response puzzle that the IR[CE] must answer if it wishes to enlist this Templin Expires February 27, 2011 [Page 14] Internet-Draft IRON August 2010 IR[VE]'s services. When the IR[CE] receives these SRA messages, it can measure the round trip time between sending the SRS and receiving the SRA as an indication of round-trip delay. If the IR[CE] wishes to enlist the services of a specific IR[VE] (e.g., based on the measured performance), it then calculates the answer to the puzzle using its keying information and sends the answer back to the IR[VE] in a new SRS message that also contains all of the IR[CE]'s EP prefixes for which it claims ownership. If the IR[CE] answered the puzzle correctly, the IR[VE] will send back a new SRA message that includes a non-zero default router lifetime and that signifies the establishment of a bidirectional tunnel. (A zero default router lifetime on the other hand signifies that the IR[VE] is currently unable to establish a bidirectional tunnel, e.g., due to heavy load, due to challenge/response failure, etc.) Note that in the above procedure it is essential that the IR[CE] select one and only one IR[VE]. This is to allow the VPC overlay network mapping system to have one and only one active EP-to-IR[VE] mapping at any point in time which shares fate with the IR[VE] itself. If this IR[VE] fails, the IR[CE] will quickly select a new one which will automatically update the VPC overlay network mapping system with a new EP-to-IR[VE] mapping. 6. IRON Operation Following the IRON initialization detailed in Section 5, IRs engage in the steady-state process of receiving and forwarding packets. All IRs forward encapsulated packets over the IRON using the mechanisms of VET [I-D.templin-intarea-vet] and SEAL [I-D.templin-intarea-seal], while IR[VC]s (and in some cases IR[VE]s) additionally forward packets to and from the native IPv6 and IPv4 Internets. IRs also use the SEAL Control Message Protocol (SCMP) to coordinate with other IRs, including the process of sending and receiving redirect messages, error messages, etc. (Note however that an IR must not send an SCMP message in response to an SCMP error message.) Each IR operates as specified in the following sub-sections. 6.1. IR[CE] Operation After selecting its serving IR[VE] as specified in Section 5.3, the IR[CE] should register each of its ISP connections with the IR[VE] in order to establish multiple bidirectional tunnels for multihoming purposes. To do so, it sends periodic SRS messages to its serving IR[VE] via each of its ISPs to establish additional bidirectional tunnels and to keep each tunnel alive. These messages need not Templin Expires February 27, 2011 [Page 15] Internet-Draft IRON August 2010 include challenge/response mechanisms since prefix proof of ownership was already established in the initial exchange and a nonce in the SEAL header can be used to confirm that the SRS message was sent by the correct IR[CE]. This implies that a single nonce is used to represent the set of all bidirectional tunnels between the IR[CE] and the IR[VE]. Therefore, there are multiple bidirectional tunnels, and the nonce names this "bundle" of tunnels. (The IR[CE] and IR[VE] may conceptually represent this "bundle" as a single tunnel with multiple locator addresses, however each such locator address must be tested independently in case there are NATs on the path.) If the IR[CE] ceases to receive SRA messages from its serving IR[VE] via a specific ISP connection, it marks the IR[VE] as unreachable from that address and therefore over that ISP connection. (The IR[CE] should also inform its serving IR[VE] of this outage via one of its working ISP connections.) If the IR[CE] ceases to receive SRA messages from its serving IR[VE] via multiple ISP connections, it marks the IR[VE] as unusable and quickly attempts to establish a bidirectional tunnel with a new IR[VE]. The act of establishing the tunnel with a new serving IR[VE] will automatically purge the stale mapping state associated with the old serving IR[VE]. When an end system in an EUN sends a flow of packets to a correspondent, the packets are forwarded through the EUN via normal routing until they reach the IR[CE], which then tunnels the initial packets to its serving IR[VE] as the next hop. In particular, the IR[CE] encapsulates each packet in an outer header with its locator as the source address and the locator of its serving IR[VE] as the destination address. Note that after sending the initial packets of a flow, the IR[CE] may receive critical SCMP messages such as indications of PMTU limitations, redirects that point to a better next hop, etc. It is therefore essential that the IR[CE] send the initial packets through its serving IR[VE] to avoid loss of SCMP messages that cannot traverse a NAT in the reverse direction. The IR[CE] uses the mechanisms specified in VET and SEAL to encapsulate each forwarded packet. The IR[CE] further uses the SCMP protocol to coordinate with other IRs, including accepting redirects and other SCMP messages. When the IR[CE] receives an SCMP message, it checks the nonce field of the encapsulated packet-in-error to verify that the message corresponds to a packet that it had previously sent and accepts the message if the nonce matches. (Note however that the outer source and destination addresses of the packet-in-error may be different than those in the original packet due to possible IR[VE] and/or IR[VC] address rewritings.) Templin Expires February 27, 2011 [Page 16] Internet-Draft IRON August 2010 6.2. IR[VE] Operation After an IR[VE] is initialized, it responds to SRSs from IR[CE]s by sending SRAs as described in Section 6.1. When the IR[VE] receives an SRS message from a new IR[CE], it sends back an SRA message with a challenge/response puzzle. The IR[CE] in turn sends an SRS message with an answer to the puzzle. If this authentication fails, the IR[VE] discards the message. Otherwise, it creates tunnel state for this new IR[CE], records the EPs in its FIB, and records the locator address from the SCMP message as the link-layer address of the next hop. The IR[VE] next sends an SRA message back to the IR[CE] to complete the tunnel establishment. When the IR[VE] receives a SEAL-encapsulated packet from one of its IR[CE] tunnel endpoints, it examines the inner destination address. If the inner destination address is not an EPA, the IR[VE] decapsulates the packet and forwards it unencapsulated into the Internet if it is able to do so without loss due to ingress filtering. Otherwise, the IR[VE] re-encapsulates the packet (i.e., it removes the outer header and replaces it with a new outer header of the same address family) and sets the outer destination address to the locator address of an IR[VC] within its VPC overlay network. It then forwards the re-encapsulated packet to the IR[VC], which will in turn decapsulate it and forward it into the Internet. If the inner destination address is an EPA, however, the IR[VE] rewrites the outer source address to one of its own locator address and rewrites the outer destination address to the inner destination address. (If the outer header is of a different address family than the inner header, the IR[VE] instead rewrites the destination address to any address taken from the companion prefix associated with the inner destination address.) The IR[VE] then forwards the revised packet into the Internet via a default or more-specific route, where it may be interpreted as an implicit anycast by a router within the destination VPC overlay network. After sending the packet, the IR[VE] may then receive an SCMP error or redirect message from an IR[VC]/IR[VE] within the destination VPC overlay network. In that case, the IR[VE] verifies that the nonce in the message matches the tunnel corresponding to the IR[CE] that sent the original inner packet and discards the message if the nonce does not match. Otherwise, the IR[VE] re-encapsulates the SCMP message in a new outer header that uses the source address, destination address and nonce parameters associated with the tunnel to IR[CE]]; it then forwards the message to the IR[CE]. This arrangement is necessary to allow SCMP messages to flow through any NATs on the path. When an IR[VE](A) receives a SEAL-encapsulated packet from an IR[VC] or from the Internet, if the inner destination address matches an EP Templin Expires February 27, 2011 [Page 17] Internet-Draft IRON August 2010 in its FIB IR[VE](A) re-encapsulates the packet in a new outer header that uses the source address, destination address and nonce parameters associated with the tunnel and forwards it to its client IR[CE](B) which in turn decapsulates the packet and forwards it to the correct end system in the EUN. If IR[CE](B) has left notice with IR[VE](A) that it has moved to a new IR[VE](C), however, IR[VE](A) will instead forward the packet to IR[VE](C) and also send an SCMP redirect message back to the source of the packet. In this way, IR[CE](B) can leave behind forwarding information when changing between IR[VE]s (e.g., due to mobility events) without exposing packets to loss. 6.3. IR(VC) Operation After an IR[VC] has synchronized its VPs (see: Section 5.1) it advertises the full set of the company's VP's into the IPv4 and IPv6 Internet BGP routing systems. The VPs will be represented as ordinary routing information in the BGP, and any packets originating from the IPv4 or IPv6 Internet destined to an EPA covered by one of the VPs will be forwarded into the VPC's overlay network by an IR[VC]. When an IR[VC] receives a packet from the Internet destined either to an EPA covered by one of its VPs or to an address within one of its companion prefixes, it intercepts the packet as though it were addressed to itself, i.e., to support the implicit anycast service model. It then examines the packet format to determine the proper handling procedures as follows: o If the packet is an SCMP SRS message, the IR[VC] sends an SRA message back to the source listing the locator addresses of nearby IR[VE] routers then discards the message. o If the packet is not SEAL-encapsulated the IR[VC] looks in its FIB to discover a locator of the IR[VE] that serves the destination address. The IR[VC] then simply encapsulates the packet with its own locator as the outer source address and the locator of the IR[VE] as the outer destination address and forwards the packet to the IR[VE]. o If the packet is SEAL-encapsulated the IR[VC] sends an SCMP redirect message of the same address family back to the source with the locator of the serving IR[VE] as the redirected target. The source and destination addresses of the SCMP redirect message use the outer destination and source addresses of the original packet, respectively. After sending the redirect message, the IR[VC] then rewrites the outer destination address of the SEAL- encapsulated packet to the locator of the IR[VE] and forwards the Templin Expires February 27, 2011 [Page 18] Internet-Draft IRON August 2010 revised packet to the IR[VE]. Note that in this arrangement any errors that occur on the path between the IR[VC] to the IR[VE] will be delivered to the original source but with a different destination address due to this IR[VC] address rewriting. 6.4. IRON Reference Operating Scenarios The IRON is used to support communications when one or both hosts are located within EP-addressed EUNs regardless of whether the EPs are provisioned by the same VPC or by different VPCs. When both hosts are within IRON EUNs, route redirections that eliminate unnecessary IR[VE]s and IR[VC]s from the path are possible. When only one host is within an IRON EUN, however, route optimization cannot be used. The following sections discuss the two scenarios. 6.4.1. Both Hosts Within IRON EUNs When both hosts are within IRON EUNs, it is sufficient to consider the scenario in a unidirectional fashion, i.e., by tracing packet flows only in the forward direction from the source host to destination host. The reverse direction can be considered separately, and incurs the same considerations as for the forward direction. In this scenario, the initial packets of a flow produced by a source host must flow through both the source's serving IR[VE] and an IR[VC] of the destination host, but route optimization can eliminate these elements from the path for subsequent packets in the flow. Figure 6 shows the flow of initial packets from host A to host B within two IRON EUNs. Templin Expires February 27, 2011 [Page 19] Internet-Draft IRON August 2010 ________________________________________ .-( .-. )-. .-( ,-( _)-. )-. .-( +========+(_ (_ +=====+ )-. .( || (_|| Internet ||_) || ). .( || ||-(______)-|| vv ). .( +--------++--+ || || +------------+ ). ( +==>| IR[VE](A) | vv || | IR[VE](B) |====+ ) ( // +---------++-+ +--++----++--+ +------------+ \\ ) ( // .-. | \<-- | IR[VC](B) | .-. \\ ) ( //,-( _)-. | +------------+ ,-( _)-\\ ) ( .||_ (_ )-. | .-(_ (_ ||. ) ( _|| ISP A .) | (redirect) (__ ISP B ||_)) ( ||-(______)-' | `-(______)|| ) ( || | | | vv ) ( +-----+-----+ | +-----+-----+ ) | IR[CE](A) | <--+ | IR[CE](B) | +-----+-----+ The IRON +-----+-----+ | ( (Overlaid on the native Internet) ) | .-. .-( .-) .-. ,-( _)-. .-(________________________)-. ,-( _)-. .-(_ (_ )-. .-(_ (_ )-. (_ IRON EUN A ) (_ IRON EUN B ) `-(______)-' `-(______)-' | | +---+----+ +---+----+ | Host A | | Host B | +--------+ +--------+ Figure 6: Initial Packet Flow Before Redirects With reference to Figure 6, host A sends packets destined to host B via its network interface connected to EUN A. Routing within EUN A will direct the packets to IR[CE](A) as a default router for the EUN which then uses VET and SEAL to encapsulate them in outer headers with its locator address as the outer source address and the locator address of its serving IR[VE](A) as the outer destination address. IR[CE](A) then simply releases the encapsulated packets into its ISP network connection that provided its locator. The ISP will release the packets into the Internet without filtering since the (outer) source address is topologically correct. Once the packets have been released into the Internet, routing will direct them to IR[VE](A). IR[VE](A) receives the encapsulated packets from IR[CE](A) then rewrites the outer source address to one of its own locator addresses, and rewrites the outer destination address to the inner destination address. (If the outer header is of a different address family than the inner header, however, the IR[VE] instead rewrites Templin Expires February 27, 2011 [Page 20] Internet-Draft IRON August 2010 the destination address to any address taken from the companion prefix associated with the inner destination address.) IR[VE](A) then releases the revised packets into the Internet where routing will direct them to IR[VC](B) which advertises a prefix that covers the outer destination address. IR[VC](B) will intercept the encapsulated packets from IR[VE](A) then check its FIB to discover an entry that covers inner destination address B with IR[VE](B) as the next hop. IR[VC](B) then returns SCMP redirect messages to IR[VE](A) (*), rewrites the outer destination address of the encapsulated packets to the locator address of IR[VE](B), and forwards these revised packets to IR[VE](B). IR[VE](B) will receive the encapsulated packets from IR[VC](B) then check its FIB to discover an entry that covers destination address B with IR[CE](B) as the next hop. IR[VE](B) then re-encapsulates the packets in a new outer header that uses the source address, destination address and nonce parameters associated with the tunnel to IR[CE](B). IR[VE](B) then releases these re-encapsulated packets into the Internet, where routing will direct them to IR[CE](B). IR[CE](B) will in turn decapsulate the packets and forward the inner packets to host B via EUN B. (*) Note that after the initial flow of packets, IR[VE](A) will have received one or more SCMP redirect messages from IR[VC](B) informing it of IR[VE](B) as a better next hop. IR[VE](A) will in turn forward the redirects to IR[CE](A), which will thereafter forward its encapsulated packets directly to the locator address of IR[VE](B) without involving either IR[VE](A) or IR[VC](B) as shown in Figure 7: Templin Expires February 27, 2011 [Page 21] Internet-Draft IRON August 2010 ________________________________________ .-( .-. )-. .-( ,-( _)-. )-. .-( +=============> .-(_ (_ )-.======+ )-. .( // (__ Internet _) || ). .( // `-(______)-' vv ). .( // +------------+ ). ( // | IR[VE](B) |====+ ) ( // +------------+ \\ ) ( // .-. .-. \\ ) ( //,-( _)-. ,-( _)-\\ ) ( .||_ (_ )-. .-(_ (_ ||. ) ( _|| ISP A .) (__ ISP B ||_)) ( ||-(______)-' `-(______)|| ) ( || | | vv ) ( +-----+-----+ The IRON +-----+-----+ ) | IR[CE](A) | (Overlaid on the native Internet) | IR[CE](B) | +-----+-----+ +-----+-----+ | ( ) | .-. .-( .-) .-. ,-( _)-. .-(________________________)-. ,-( _)-. .-(_ (_ )-. .-(_ (_ )-. (_ IRON EUN A ) (_ IRON EUN B ) `-(______)-' `-(______)-' | | +---+----+ +---+----+ | Host A | | Host B | +--------+ +--------+ Figure 7: Sustained Packet Flow After Redirects 6.4.2. Mixed IRON and Non-IRON Hosts When one host is within an IRON EUN and the other is in a non-IRON EUN (i.e., one that connects to the native Internet instead of the IRON), the IR elements involved depend on the packet flow directions. The cases are described in the following sections. 6.4.2.1. From IRON Host A to Non-IRON Host B Figure 8 depicts the IRON reference operating scenario for packets flowing from Host A in an IRON EUN to Host B in a non-IRON EUN: Templin Expires February 27, 2011 [Page 22] Internet-Draft IRON August 2010 _________________________________________ .-( )-. )-. .-( +-------)----+ )-. .-( | IR[VC](A) |--------------+ )-. .( +------------+ \ ). .( +=======>| IR[VE](A) | \ ). .( // +--------)---+ \ ). ( // ) \ ) ( // The IRON ) \ ) ( // .-. ) \ .-. ) ( //,-( _)-. ) \ ,-( _)-. ) ( .||_ (_ )-. ) The Native Internet .-|_ (_ )-. ) ( _|| ISP A ) ) (_ | ISP B )) ( ||-(______)-' ) |-(______)-' ) ( || | )-. v | ) ( +-----+ ----+ )-. +-----+-----+ ) | IR[CE](A) |)-. | Router B | +-----+-----+ +-----+-----+ | ( ) | .-. .-(____________________________________)-. .-. ,-( _)-. ,-( _)-. .-(_ (_ )-. .-(_ (_ )-. (_ IRON EUN A ) (_ non-IRON EUN ) `-(______)-' `-(___B___)-' | | +---+----+ +---+----+ | Host A | | Host B | +--------+ +--------+ Figure 8: From IRON Host A to Non-IRON Host B In this scenario, host A sends packets destined to host B via its network interface connected to IRON EUN A. Routing within EUN A will direct the packets to IR[CE](A) as a default router for the EUN which then uses VET and SEAL to encapsulate them in outer headers with its locator address as the outer source address and the locator address of IR[VE](A) as the outer destination address. The ISP will pass the packets without filtering since the (outer) source address is topologically correct. Once the packets have been released into the native Internet, routing will direct them to IR[VE](A). IR[VE](A) receives the encapsulated packets from IR[CE](A) then re- encapsulates and forwards them to IR[VC](A), which simply decapsulates them and releases the unencapsulated packets into the Internet. Once the packets are released into the Internet, routing will direct them to the final destination B. (Note that IR[VE](A) and IR[VC](A) are depicted in Figure 8 as two halves of a unified IR[VP](A). In that case, the "forwarding" between IR[VE](A) and Templin Expires February 27, 2011 [Page 23] Internet-Draft IRON August 2010 IR[VC](A) is a zero-instruction imaginary operation.) This scenario always involves an IR[VE](A) and IR[VC](A) owned by the VPC that provides service to IRON EUN A. It therefore imparts a cost that would need to be borne by either the VPC or its customers. 6.4.2.2. From Non-IRON Host B to IRON Host A Figure 9 depicts the IRON reference operating scenario for packets flowing from Host B in an Non-IRON EUN to Host A in an IRON EUN: _______________________________________ .-( )-. )-. .-( +-------)----+ )-. .-( | IR[VC](A) |<-------------+ )-. .( +------------+ \ ). .( +========| IR[VE](A) | \ ). .( // +--------)---+ \ ). ( // ) \ ) ( // The IRON ) \ ) ( // .-. ) \ .-. ) ( //,-( _)-. ) \ ,-( _)-. ) ( .||_ (_ )-. ) The Native Internet .-|_ (_ )-. ) ( _|| ISP A ) ) (_ | ISP B )) ( ||-(______)-' ) |-(______)-' ) ( vv | )-. | | ) ( +-----+ ----+ )-. +-----+-----+ ) | IR[CE](A) |)-. | Router B | +-----+-----+ +-----+-----+ | ( ) | .-. .-(____________________________________)-. .-. ,-( _)-. ,-( _)-. .-(_ (_ )-. .-(_ (_ )-. (_ IRON EUN A ) (_ non-IRON EUN ) `-(______)-' `-(___B___)-' | | +---+----+ +---+----+ | Host A | | Host B | +--------+ +--------+ Figure 9: From Non-IRON Host B to IRON Host A In this scenario, host B sends packets destined to host A via its network interface connected to non-IRON EUN B. Routing will direct the packets to IR[VC](A) which then forwards them to IR[VE](A) using encapsulation if necessary. (Note that in this diagram IR[VE](A) and IR[VC](A) are depicted as two halves of a unified IR[VP](A). In that case, the "forwarding" between IR[VE](A) and IR[VC](A) is a zero- Templin Expires February 27, 2011 [Page 24] Internet-Draft IRON August 2010 instruction imaginary operation.) IR[VE](A) will then check its FIB to discover an entry that covers destination address A with IR[CE](A) as the next hop. IR[VE](A) then (re-)encapsulates the packets in an outer header that uses the source address, destination address and nonce parameters associated with the tunnel to IR[CE](A). IR[VE](A) next releases these (re-)encapsulated packets into the Internet, where routing will direct them to IR[CE](A). IR[CE](A) will in turn decapsulate the packets and forward the inner packets to host A via its network interface connected to IRON EUN A. This scenario always involves an IR[VE](A) and IR[VC](A) owned by the VPC that provides service to IRON EUN A. It therefore imparts a cost that would need to be borne by either the VPC or its customers. 6.5. Mobility, Multihoming and Traffic Engineering Considerations While IR[VE]s and IR[VC]s can be considered as fixed infrastructure, IR[CE]s may need to move between different network points of attachment, connect to multiple ISPs, or explicitly manage their traffic flows. The following sections discuss mobility, multi-homing and traffic engineering considerations for IR[CE]s. 6.5.1. Mobility Management When an IR[CE] changes its network point of attachment (e.g., due to a mobility event), it configures one or more new locators. If the IR[CE] has not moved far away from its previous network point of attachment, it simply informs its serving IR[VE] of any locator additions or deletions. This operation is performance-sensitive, and should be conducted immediately to avoid packet loss. If the IR[CE] has moved far away from its previous network point of attachment, however, it re-issues the implicit anycast discovery procedure described in Section 6.1 to discover whether its candidate set of serving IR[VE]s has changed. If the IR[CE]'s current serving IR[VE] is also included in the new list received from the VPC, this serves as indication that the IR[CE] has not moved far enough to warrant changing to a new serving IR[VE]. Otherwise, the IR[CE] may wish to move to a new serving IR[VE] in order to maintain optimal routing. This operation is not performance-critical, and therefore can be conducted over a matter of seconds/minutes instead of milliseconds/microseconds. To move to a new IR[VE], the IR[CE] first engages in the EP registration process with the new IR[VE] and maintains the registrations through periodic SRS/SRA exchanges the same as Templin Expires February 27, 2011 [Page 25] Internet-Draft IRON August 2010 described in Section 6.1. The IR[CE] then informs its former IR[VE] that it has moved by providing it with the locator address of the new IR[VE]. The IR[CE] then discontinues the SRS/SRA keepalive process with the former IR[VE], which will garbage-collect the stale FIB entries when their lifetime expires. This will allow the former IR[VE] to redirect existing correspondents to the new IR[VE] so that no packets are lost. 6.5.2. Multihoming An IR[CE] may register multiple locators with its serving IR[VE]. It can assign metrics with its registrations to inform its IR[VE] of preferred locators, and can select outgoing locators according to its local preferences. Multihoming is therefore naturally supported. 6.5.3. Inbound Traffic Engineering An IR[CE] can dynamically adjust the priorities of its prefix registrations with its serving IR[VE] in order to influence inbound traffic flows. It can also change between serving IR[VE]s when multiple IR[VE]s are available, but should strive for stability in its IR[VE] selection in order to limit VPC network routing churn. 6.5.4. Outbound Traffic Engineering An IR[CE] can select outgoing locators, e.g., based on current QoS considerations such as minimizing one-way delay or one-way delay variance. 6.6. Renumbering Considerations As better link layer technologies and service plans emerge, customers will be motivated to select their service providers through healthy competition between ISPs. If a customer's EUN addresses are tied to a specific ISP, however, the customer may be forced to undergo a painstaking EUN renumbering process if it wishes to change to a different ISP [RFC4192][RFC5887]. When a customer obtains EP prefixes from a VPC, it can change between ISPs seamlessly and without need to renumber. If the VPC itself applies unreasonable costing structures for use of the EPs, however, the customer may be compelled to seek a different VPC and would again be required to confront a renumbering scenario. The IRON approach to renumbering avoidance therefore depends on VPCs conducting ethical business practices and offering reasonable rates. Templin Expires February 27, 2011 [Page 26] Internet-Draft IRON August 2010 6.7. NAT Traversal Considerations The Internet today consists of a global public IPv4 routing and addressing system with non-IRON EUNs that use either public or private IPv4 addressing. The latter class of EUNs connect to the public Internet via Network Address Translators (NATs). When an IR[CE] is located behind a NAT, its selects IR[VE]s using the same procedures as for IR[CE]s with public addresses, i.e., it will send SRS messages to IR[VE]s in order to get SRA messages in return. The only requirement is that the IR[CE] must configure its SEAL encapsulation to use a transport protocol that supports NAT traversal, namely UDP. Since the IR[VE] maintains state about its IR[CE] customers, it can discover locator information for each IR[CE] by examining the UDP port number and IP address in the outer headers of SRS messages. When there is a NAT in the path, the UDP port number and IP address in the SRS message will correspond to state in the NAT box and might not correspond to the actual values assigned to the IR[CE]. The IR[VE] can then encapsulate packets destined to hosts serviced by the IR[CE] within outer headers that use this IP address and UDP port number. The NAT box will receive the packets, translate the values in the outer headers to match those assigned to the IR[CE], then forward the packets to the IR[CE]. In this sense, the IR[VE]'s "locator" for the IR[CE] consists of the concatenation of the IP address and UDP port number. IRON does not introduce any new issues to complications raised for NAT traversal or for applications embedding address referrals in their payload. 6.8. Nested EUN Considerations Each IR[CE] configures a locator that may be taken from an ordinary non-EPA address assigned by an ISP or from an EPA address taken from an EP assigned to another IR[CE]. In that case, the IR[CE] is said to be "nested" within the EUN of another IR[CE], and recursive nestings of multiple layers of encapsulations may be necessary. For example, in the network scenario depicted in Figure 10 IR[CE](A) configures a locator EPA(B) taken from the EP assigned to EUN(B). IR[CE](B) in turn configures a locator EPA(C) taken from the EP assigned to EUN(C). Finally, IR[CE](C) configures a locator ISP(D) taken from a non-EPA address delegated by an ordinary ISP(D). Using this example, the "nested-IRON" case must be examined in which a host A which configures the address EPA(A) within EUN(A) exchanges packets with host Z located elsewhere in the Internet. Templin Expires February 27, 2011 [Page 27] Internet-Draft IRON August 2010 .-. ISP(D) ,-( _)-. +-----------+ .-(_ (_ )-. | IR[CE](C) |--(_ ISP(D) ) +-----+-----+ `-(______)-' | <= T \ .-. .-. u \ ,-( _)-. ,-( _)-. n .-(_ (- )-. .-(_ (_ )-. n (_ Internet ) (_ EUN(C) ) e `-(______)-' `-(______)-' l ___ | EPA(C) s => (:::)-. +-----+-----+ .-(::::::::) | IR[CE](B) | .-(::::::::::::)-. +-----------+ +-----+-----+ (:::: The IRON ::::) | IR[VC](Z) | | `-(::::::::::::)-' +-----------+ .-. `-(::::::)-' +-----------+ ,-( _)-. | IR[VE](Z) | .-(_ (_ )-. +-----------+ +-----------+ (_ EUN(B) ) | IR[VE](C) | +-----------+ `-(______)-' +-----------+ | IR[CE](Z) | | EPA(B) +-----------+ +-----------+ +-----+-----+ | IR[VE](B) | +--------+ | IR[CE](A) | +-----------+ | Host Z | +-----------+ +-----------+ +--------+ | | IR[VE](A) | .-. +-----------+ ,-( _)-. EPA(A) .-(_ (_ )-. +--------+ (_ EUN(A) )---| Host A | `-(______)-' +--------+ Figure 10: Nested EUN Example The two cases of host A sending packets to host Z, and host Z sending packets to host A, must be considered separately as described below. 6.8.1. Host A Sends Packets to Host Z Host A first forwards a packet with source address EPA(A) and destination address Z into EUN(A). Routing within EUN(A) will direct the packet to IR[CE](A), which encapsulates it in an outer header with EPA(B) as the outer source address and IR[VE](A) as the outer destination address then forwards the once-encapsulated packet into EUN(B). Routing within EUN[B] will direct the packet to IR[CE](B), which encapsulates it in an outer header with EPA(C) as the outer source address and IR[VE](B) as the outer destination address then forwards the twice-encapsulated packet into EUN(C). Routing within Templin Expires February 27, 2011 [Page 28] Internet-Draft IRON August 2010 EUN(C) will direct the packet to IR[CE](C), which encapsulates it in an outer header with ISP(D) as the outer source address and IR[VE](C) as the outer destination address. IR[CE](C) then sends this triple- encapsulated packet into the ISP(D) network, where it will be routed into the Internet to IR[VE](C). When IR[VE](C) receives the triple-encapsulated packet, it removes the outer layer of encapsulation and forwards the resulting twice- encapsulated packet into the Internet to IR[VE](B). Next, IR[VE](B) removes the outer layer of encapsulation and forwards the resulting once-encapsulated packet into the Internet to IR[VE](A). Next, IR[VE](A) checks the address type of the inner address 'Z'. If Z is a non-EPA address, IR[VE](A) simply decapsulates the packet and forwards it into the Internet. Otherwise, IR[VE](A) rewrites the outer source and destination addresses of the once-encapsulated packet and forwards it to IR[VC](Z). IR[VC](Z) in turn rewrites the outer destination address of the packet to the locator for IR[VE](Z), then forwards the packet and sends a redirect to IR[VE](A). IR[VE](Z) then re-encapsulates the packet and forwards it to IR[CE](Z), which decapsulates it and forwards the inner packet to host Z. Subsequent packets from IR[CE](A) will then use IR[VE](Z) as the next hop toward host Z 6.8.2. Host Z Sends Packets to Host A Whether or not host Z configures an EPA address, its packets destined to Host A will eventually reach IR[VE](A). IR[VE](A) will have a mapping that lists IR[CE](A) as the next hop toward EPA(A). IR[VE](A) will then encapsulate the packet with EPA(B) as the outer destination address and forward the packet into the Internet. Internet routing will convey this once-encapsulated packet to IR[VE](B) which will have a mapping that lists IR[CE](B) as the next hop toward EPA(B). IR[VE](B) will then encapsulate the packet with EPA(C) as the outer destination address and forward the packet into the Internet. Internet routing will then convey this twice- encapsulated packet to IR[VE](C) which will have a mapping that lists IR[CE](C) as the next hop toward EPA(C). IR[VE](C) will then encapsulate the packet with ISP(D) as the outer destination address and forward the packet into the Internet. Internet routing will then convey this triple-encapsulated packet to IR[CE](C). When the triple-encapsulated packet arrives at IR[CE](C), it strips the outer layer of encapsulation and forwards the twice-encapsulated packet to EPA(C) which is the locator address of IR[CE](B). When IR[CE](B) receives the twice-encapsulated packet, it strips the outer layer of encapsulation and forwards the once-encapsulated packet to EPA(B) which is the locator address of IR[CE](A). When IR[CE](A) receives the once-encapsulated packet, it strips the outer layer of Templin Expires February 27, 2011 [Page 29] Internet-Draft IRON August 2010 encapsulation and forwards the unencapsulated packet to EPA(A) which is the host address of host A. 7. Additional Considerations Considerations for the scalability of Internet Routing due to multihoming, traffic engineering and provider-independent addressing are discussed in [I-D.narten-radir-problem-statement]. Route optimization considerations for mobile networks are found in [RFC5522]. 8. Related Initiatives IRON builds upon the concepts RANGER architecture [RFC5720], and therefore inherits the same set of related initiatives. Virtual Aggregation (VA) [I-D.ietf-grow-va] and Aggregation in Increasing Scopes (AIS) [I-D.zhang-evolution] provide the basis for the Virtual Prefix concepts. Internet vastly improved plumbing (Ivip) [I-D.whittle-ivip-arch] has contributed valuable insights, including the use of real-time mapping. The use of IR[VE]s as mobility anchor points is directly influenced by Ivip's associated TTR mobility extensions [TTRMOB]. [I-D.bernardos-mext-nemo-ro-cr] discussed a route optimization approach using a Correspondent Router (CR) model. The IRON IR[VE] construct is similar to the CR concept described in this work, however the manner in which customer EUNs coordinates with IR[VE]s is different and based on the redirection model associated with NBMA links. Numerous publications have proposed NAT traversal techniques. The NAT traversal techniques adapted for IRON were inspired by the Simple Address Mapping for Premises Legacy Equipment (SAMPLE) proposal [I-D.carpenter-softwire-sample]. 9. IANA Considerations There are no IANA considerations for this document. Templin Expires February 27, 2011 [Page 30] Internet-Draft IRON August 2010 10. Security Considerations Security considerations that apply to tunneling in general are discussed in [I-D.ietf-v6ops-tunnel-security-concerns]. Additional considerations that apply also to IRON are discussed in RANGER [RFC5720], VET [I-D.templin-intarea-vet] and SEAL [I-D.templin-intarea-seal]. IR[CE]s require a means for securely registering their EP-to-locator bindings with their VPC. Each VPC provides its customer IR[CE]s with a secure means for registering and re-registering their mappings. 11. Acknowledgements This ideas behind this work have benefited greatly from discussions with colleagues; some of which appear on the RRG and other IRTF/IETF mailing lists. Robin Whittle and Steve Russert co-authored the TTR mobility architecture which strongly influenced IRON. Eric Fleischman pointed out the opportunity to leverage anycast for discovering topologically-close servers. Thomas Henderson recommended a quantitative analysis of scaling properties. The following individuals provided essential review input: Mohamed Boucadair, Wesley Eddy, Dae Young Kim and Robin Whittle. 12. References 12.1. Normative References [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. 12.2. Informative References [BGPMON] net, B., "BGPmon.net - Monitoring Your Prefixes, http://bgpmon.net/stat.php", June 2010. [I-D.bernardos-mext-nemo-ro-cr] Bernardos, C., Calderon, M., and I. Soto, "Correspondent Router based Route Optimisation for NEMO (CRON)", draft-bernardos-mext-nemo-ro-cr-00 (work in progress), July 2008. Templin Expires February 27, 2011 [Page 31] Internet-Draft IRON August 2010 [I-D.carpenter-softwire-sample] Carpenter, B. and S. Jiang, "Legacy NAT Traversal for IPv6: Simple Address Mapping for Premises Legacy Equipment (SAMPLE)", draft-carpenter-softwire-sample-00 (work in progress), June 2010. [I-D.ietf-grow-va] Francis, P., Xu, X., Ballani, H., Jen, D., Raszuk, R., and L. Zhang, "FIB Suppression with Virtual Aggregation", draft-ietf-grow-va-02 (work in progress), March 2010. [I-D.ietf-v6ops-tunnel-security-concerns] Hoagland, J., Krishnan, S., and D. Thaler, "Security Concerns With IP Tunneling", draft-ietf-v6ops-tunnel-security-concerns-02 (work in progress), March 2010. [I-D.narten-radir-problem-statement] Narten, T., "On the Scalability of Internet Routing", draft-narten-radir-problem-statement-05 (work in progress), February 2010. [I-D.russert-rangers] Russert, S., Fleischman, E., and F. Templin, "RANGER Scenarios", draft-russert-rangers-05 (work in progress), July 2010. [I-D.templin-intarea-seal] Templin, F., "The Subnetwork Encapsulation and Adaptation Layer (SEAL)", draft-templin-intarea-seal-16 (work in progress), July 2010. [I-D.templin-intarea-vet] Templin, F., "Virtual Enterprise Traversal (VET)", draft-templin-intarea-vet-16 (work in progress), July 2010. [I-D.whittle-ivip-arch] Whittle, R., "Ivip (Internet Vastly Improved Plumbing) Architecture", draft-whittle-ivip-arch-04 (work in progress), March 2010. [I-D.zhang-evolution] Zhang, B. and L. Zhang, "Evolution Towards Global Routing Scalability", draft-zhang-evolution-02 (work in progress), October 2009. [RFC1070] Hagens, R., Hall, N., and M. Rose, "Use of the Internet as Templin Expires February 27, 2011 [Page 32] Internet-Draft IRON August 2010 a subnetwork for experimentation with the OSI network layer", RFC 1070, February 1989. [RFC3849] Huston, G., Lord, A., and P. Smith, "IPv6 Address Prefix Reserved for Documentation", RFC 3849, July 2004. [RFC4192] Baker, F., Lear, E., and R. Droms, "Procedures for Renumbering an IPv6 Network without a Flag Day", RFC 4192, September 2005. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4548] Gray, E., Rutemiller, J., and G. Swallow, "Internet Code Point (ICP) Assignments for NSAP Addresses", RFC 4548, May 2006. [RFC5214] Templin, F., Gleeson, T., and D. Thaler, "Intra-Site Automatic Tunnel Addressing Protocol (ISATAP)", RFC 5214, March 2008. [RFC5522] Eddy, W., Ivancic, W., and T. Davis, "Network Mobility Route Optimization Requirements for Operational Use in Aeronautics and Space Exploration Mobile Networks", RFC 5522, October 2009. [RFC5720] Templin, F., "Routing and Addressing in Networks with Global Enterprise Recursion (RANGER)", RFC 5720, February 2010. [RFC5737] Arkko, J., Cotton, M., and L. Vegoda, "IPv4 Address Blocks Reserved for Documentation", RFC 5737, January 2010. [RFC5743] Falk, A., "Definition of an Internet Research Task Force (IRTF) Document Stream", RFC 5743, December 2009. [RFC5887] Carpenter, B., Atkinson, R., and H. Flinck, "Renumbering Still Needs Work", RFC 5887, May 2010. [TTRMOB] Whittle, R. and S. Russert, "TTR Mobility Extensions for Core-Edge Separation Solutions to the Internet's Routing Scaling Problem, http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf", August 2008. Templin Expires February 27, 2011 [Page 33] Internet-Draft IRON August 2010 Appendix A. IRON VPs Over Internetworks with Different Address Families The IRON architecture leverages the routing system by providing generally shortest-path routing for packets with EPA addresses from VPs that match the address family of the underlying Internetwork. When the VPs are of an address family that is not routable within the underlying Internetwork, however, (e.g., when OSI/NSAP [RFC4548] VPs are used within an IPv4 Internetwork) a global mapping database is required to allow IR[VE]s to map VPs to companion prefixes taken from address families that are routable within the Internetwork. For example, an IPv6 VP (e.g., 2001:DB8::/32) could be paired with a companion IPv4 prefix (e.g., 192.0.2.0/24) so that encapsulated IPv6 packets can be forwarded over IPv4-only Internetworks. Every VP in the IRON must therefore be represented in a globally distributed Master VP database (MVPd) that maintains VP-to-companion prefix mappings for all VPs in the IRON. The MVPd is maintained by a globally-managed assigned numbers authority in the same manner as the Internet Assigned Numbers Authority (IANA) currently maintains the master list of all top-level IPv4 and IPv6 delegations. The database can be replicated across multiple servers for load balancing much in the same way that FTP mirror sites are used to manage software distributions. Upon startup, each IR[VE] discovers the full set of VPs for the IRON by reading the MVPd. The IR[VE] reads the MVPd from a nearby server and periodically checks the server for deltas since the database was last read. After reading the MVPd, the IR[VE] has a full list of VP to companion prefix mappings. The IR[VE] can then forward packets toward EPAs covered by a VP by encapsulating them in an outer header of the VP's companion prefix address family and using any address taken from the companion prefix as the outer destination address. The companion prefix therefore serves as an implicit anycast prefix. Possible encapsulations in this model include IPv6-in-IPv4, IPv4-in- IPv6, OSI/CLNP-in-IPv6, OSI/CLNP-in-IPv4, etc. Appendix B. Scaling Considerations Scaling aspects of the IRON architecture have strong implications for its applicability in practical deployments. Scaling must be considered along multiple vectors including Interdomain core routing scaling, scaling to accommodate large numbers of customer EUNs, traffic scaling, state requirements, etc. Templin Expires February 27, 2011 [Page 34] Internet-Draft IRON August 2010 In terms of routing scaling, each VPC will advertise one or more VPs from which EPs are delegated to customer EUNs. Routing scaling will therefore be minimized when each VP covers many EPs. For example, the IPv6 prefix 2001:DB8::/32 contains 2^24 ::/56 EP prefixes for assignment to EUNs. The IRON could therefore accommodate 2^32 ::/56 EPs with only 2^8 ::/32 VPs advertised in the interdomain routing core. In terms of traffic scaling for IR[VC]s, each IR[VC] represents an ASBR of a "shell" enterprise network that simply directs arriving traffic packets with EPA destination addresses towards IR[VE]s that service customer EUNs. Moreover, the IR[VC] sheds traffic destined to EPAs through redirection which removes it from the path for the vast majority of traffic packets. On the other hand, each IR[VC] must handle all traffic packets forwarded between its customer EUNs and the non-IRON Internet. The scaling concerns for this latter class of traffic are no different than for ASBR routers that connect large enterprise networks to the Internet. In terms of traffic scaling for IR[VE]s, each IR[VE] services a set of the VPC overlay network's customer EUNs. The IR[VE] services all traffic packets destined to its EUNs but only services the initial packets of flows initiated from the EUNs and destined to EPAs. Therefore, traffic scaling for EPA-addressed traffic is an asymmetric consideration and is proportional to the number of EUNs each IR[VE] serves. In terms of state requirements for IR[VC]s, each IR[VC] maintains a list of all IR[VE]s in the VPC overlay network as well as FIB entries for all customer EUNs that each IR[VE] serves. This state is therefore dominated by the number of EUNs in the VPC overlay network. Sizing the IR[VC] to accommodate state information for all EUNs is therefore required during VPC overlay network planning. In terms of state requirements for IR[VE]s, each IR[VE] maintains tunnel state for each of the customer EUNs it serves but need not keep state for all EUNs in the VPC overlay network. Finally, neither IR[VC]s nor IR[VE] need keep state for final destinations of outbound traffic. IR[CE]s source and sink all traffic packets originating from or destined to the customer EUN. Therefore traffic scaling considerations for IR[CE]s are the same as for any site border router. IR[CE]s also retain state for the final destinations of outbound traffic flows. This can be managed as soft state, since stale entries purged from the cache will be refreshed when new traffic packets are sent. Templin Expires February 27, 2011 [Page 35] Internet-Draft IRON August 2010 Author's Address Fred L. Templin (editor) Boeing Research & Technology entire. Box 3707 MC 7L-49 Seattle, WA 98124 USA Email: fltemplin@acm.org Templin Expires February 27, 2011 [Page 36]