Network Working Group Patrick Frejborg Internet Draft July 2, 2010 Intended status: Experimental Expires: January 2011 Hierarchical IPv4 Framework draft-frejborg-hipv4-07.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 2, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Frejborg Internet-Draft Hierarchical IPv4 Framework July 2010 Abstract This document describes a framework how the current IPv4 address space can be divided in two new address categories; a core address space (Area Locators, ALOC) that is globally unique and an edge address space (Endpoint Locators, ELOC) that is regionally unique. The core-edge address grouping will enable a new level of hierarchy in the routing architecture of Internet. The hierarchical IPv4 framework is backwards compatible with the current IPv4 framework; it will also discuss a method to decouple the location and identifier functions, future applications can make use of the separation. The framework requires extensions to the existing Domain Name System, the existing IPv4 stack of the endpoints and to routers in the Internet. The framework can be implemented incrementally to endpoints, DNS and routers. Table of Contents 1. Requirements Notation........................................3 2. Introduction.................................................3 3. An overview of the hIPv4 framework...........................5 4. Definitions of terms.........................................8 5. Mandatory extensions to current architectures (unicast)......9 6. The header of a hIPv4 packet................................10 7. ALOC use cases..............................................14 8. Life of a hIPv4 session.....................................16 9. Overlapping Source and Destination ELOC prefixes/ports......19 10. Traceroute considerations..................................19 11. Multicast considerations...................................20 12. Traffic engineering considerations.........................21 13. Large encapsulated packets.................................23 14. Mobility considerations....................................23 15. Affected Applications and Implications.....................25 16. The Future Role of the LSR.................................26 17. Transition considerations..................................27 18. Security Considerations....................................27 19. IANA Considerations........................................28 20. Conclusion.................................................28 21. References.................................................28 21.1. References............................................28 21.2. Informative References................................29 22. Acknowledgements...........................................30 Appendix A. Future IPv4 address allocation policies............31 Appendix B. Multi-homing becomes multi-pathing.................33 Appendix C. Mobile site crossing a RIR border..................38 Appendix D. Transition Arguments...............................40 Appendix E. Integration with CES architectures.................42 Frejborg Expires January 2, 2011 [Page 2] Internet-Draft Hierarchical IPv4 Framework July 2010 1. Requirements Notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Introduction The hierarchical IPv4 (hIPv4) framework has been developed to address the issues discussed in the [IAB report] from the Routing and Addressing Workshop that was held by the Internet Architecture Board (IAB) on October 18-19, 2006, in Amsterdam, Netherlands. The current addressing (IPv4) and the future addressing (IPv6) schemes of Internet are single dimensional by their nature. This limitation, i.e. the single level addressing scheme, has created some roadblocks for further growth of Internet. If we compare Internet's current addressing schemes to other global addressing or location schemes we can notice that the other schemes use several levels in their structures. E.g. the postal system uses street address, city and country to locate a destination. Also to locate a geographical site we are using longitude and latitude in the cartography system. The other global network, the Public Switched Telephone Network (PSTN), have been build upon a three level numbering scheme that have enabled a hierarchical signaling architecture. By expanding the current IPv4 addressing scheme from a single level to a two level addressing structure most of the issues discussed in the IAB report can be solved. A convenient way to understand the two level addressing scheme of the hIPv4 framework is to compare it to the PSTN numbering scheme (E.164) which uses country codes, national destination codes and subscriber numbers. The Area Locator (ALOC) prefix in the hIPv4 addressing scheme can be considered similar as to the country code in PSTN, i.e. the ALOC prefix locates an area in Internet, the area is called an ALOC realm. The Endpoint Locator (ELOC) prefixes in hIPv4 can be compared to the subscriber numbers in PSTN, i.e. the ELOC is regionally unique at the attached ALOC realm - the ELOC can also be attached simultaneously to several ALOCs realms. By inserting the ALOC and ELOC elements as a shim header (similar as in [MPLS] and [RBridge] architectures) between the IPv4 header and the transport protocol header, a hIPv4 header is created. From the network point of view, the hIPv4 header "looks and feels like" an IPv4 header - thus fulfilling some of the goals as outlined in [EIP] and in the early definition of [Nimrod] - outcome is that the current forwarding plane do not need to be upgraded though some minor changes are needed in the control plane (e.g. ICMP extensions). Frejborg Expires January 2, 2011 [Page 3] Internet-Draft Hierarchical IPv4 Framework July 2010 Another important influence source are the report and presentations from the [Dagstuhl] workshop that declared "a future Internet architecture must hence decouple the functions of IP addresses as names, locators, and forwarding directives in order to facilitate the growth and new network-topological dynamisms of the Internet". Therefore, an identifier element needs to be added to the hIPv4 framework to provide a path for future applications to be able to remove the current dependency of the underlying network layer addressing scheme (local and remote IP address tuples). Multipath transport protocols, such as [SCTP] and the currently under development Multipath TCP [MPTCP], are the most interesting candidates to enable an identifier functionality for the hIPv4 framework. Especially MPTCP is interesting from hIPv4's point of view - one of the main goals of MPTCP is to provide backwards compatibility with current implementations, hIPv4 share the same goal. MPTCP itself do not provide a host identifier solution as e.g. [HIP] do, instead MPTCP is proposing a token - with local meaning - to manage and bundle subflows under one session between two endpoints. The token can be considered to have the characteristics of a session identifier, providing a generic cookie mechanism for the application layer and creating a session layer between the application and transport layer. Thus the usage of a session identifier will provide a mechanism to improve mobility, both in site and endpoint mobility scenarios. Since the session identifier is improving site and endpoint mobility, routing scalability is improved by introducing a hierarchical address space, why then add a host identifier to the hIPv4 framework? Introducing a host identifier mechanism, as described in [HIP], [ILNP] and [NBS], might ease or remove the locator renumbering dependencies at security nodes (firewalls) that are used to scope security zones but this approach would change fundamentally the current deployed security architecture. However, combining Name-Based Sockets [NBS] with [DNSSEC] is interesting, today security zones are scoped by using locator values (IP addresses) in the security rule sets - instead FQDN could be used in the rule sets and the renumbering of network locator values would no longer be depended upon the security rule sets in the security nodes. Another interesting aspect is that a FQDN is and needs to be globally unique, the ALOC value must be globally unique but ELOC values are only regionally unique. Nevertheless, combining host identifiers with a security architecture and DNSSEC needs further studies. Some of the design goals of this proposal include: Frejborg Expires January 2, 2011 [Page 4] Internet-Draft Hierarchical IPv4 Framework July 2010 1. The hierarchical IPv4 framework must be backwards compatible with the current IPv4 framework. 2. Minimize introduction of totally new protocols or signaling architectures, instead use well-proven protocols and insert extension to protocols where needed. 3. Create a hierarchical IPv4 addressing structure which enables a more regional allocation of IPv4 address blocks and therefore the routing table size in the "Default-Free Zone" (DFZ) will be reduced. 4. Remove the single IPv4 address space constraint; reuse IPv4 address blocks by a hierarchy. 5. Improve site mobility, i.e. a site wishes to changes its attachment point to Internet without changing its IPv4 address block. 6. Make use of the current forwarding plane (IPv4); introduce a new forwarding plane for only a few routers in an Autonomous System or group of Autonomous Systems. 7. Reduce the amount of Network Address Translation (NAT) need for IPv4-to-IPv4 traffic. 8. Provide a smooth transition path to the hierarchical IPv4 framework. 3. An overview of the hIPv4 framework In this section we will discuss the roles of the new elements, introduced by the hIPv4 framework, and their dependencies. As mentioned in the introduction section the role of an Area LOCator (ALOC) prefix is similar as to a country code in PSTN. I.e. the ALOC prefix provides a location functionality of an area within an Autonomous System (AS), or an area spanning over a group of AS, in Internet. An AS can have several ALOC prefixes assigned, e.g. due to traffic engineering requirements. The ALOC prefix will be used for routing and forwarding purposes in Internet, thus the ALOC prefix must be globally unique and is allocated from an IPv4 address block. This globally unique IPv4 address block is called the Global Locator Block (GLB). When an area within an AS (or a group of AS) are assigned an ALOC prefix the area has the potential to become an ALOC realm. In order Frejborg Expires January 2, 2011 [Page 5] Internet-Draft Hierarchical IPv4 Framework July 2010 to establish an ALOC realm more elements, further than the ALOC prefix, are needed. One or multiple Locator Swap Routers (LSR) must be attached to the ALOC realm. A LSR element is a node capable of swapping the values of the IP header and the new shim header, called the locator header. The swap mechanism of the headers is described in detail in section 8, step 4. Today's routers do not support the LSR functionality. Therefore the new functionality will most likely be developed on an external device attached to a router belonging to the ALOC realm. The external LSR might be a computer with two interface attached to a router, the first interface configured with the prefix of the ALOC and the second interface with any IPv4 prefix. The LSR do not make us of dynamic routing protocols, neither a forwarding information base (FIB) nor a cache is needed - the LSR is producing a service, i.e. swapping headers. The swap mechanism is applied on per packet basis and the information needed to carry out the swap is included in the locator header of the hIPv4 packet. Thus a computer with enough computing and I/O resources is sufficient to take the role as a LSR. Later on, the LSR functionality might be integrated into the forwarding plane of a router. One LSR can not handle all the incoming traffic designated for an ALOC realm; it would also create a potential single point of failure in the network. Therefore, several LSRs might be installed in the ALOC realm and the LSRs shall use the ALOC prefix as their locator and the routers are announcing the ALOC prefix as an anycast address within the local ALOC realm. Also, the ALOC prefix is advertised throughout the DFZ by BGP mechanisms. The placement of the LSRs in the network will influence on the ingress traffic to the ALOC realm, the LSR is providing "nearest routing" functionality. Since the forwarding paradigm of multicast packets is quite different from forwarding unicast packets the multicast functionality will have an impact on the LSR. Also, the multicast LSR (mLSR) functionality is not available on today's routers, an external device is needed, and later on the functionality might be integrated to the routers. The mLSR shall take the role of an anycast RP with MSDP and PIM capabilities, but to forward packets a FIB is not required. As with the LSR, the multicast hIPv4 packets are carrying all needed information in their headers in order to apply the swap, for details see section 11. The ALOC realm is not yet fully constructed, we can now locate the ALOC realm in Internet but to locate the endpoints attached to the ALOC realm a new element is needed, i.e. the Endpoint Locator (ELOC). As mentioned in the introduction section the ELOC prefixes can be considered similar as to the subscriber numbers in PSTN. Actually, the ELOC is not a new element; the ELOC is a redefinition of the current IPv4 address configured at an endpoint. The redefinition is Frejborg Expires January 2, 2011 [Page 6] Internet-Draft Hierarchical IPv4 Framework July 2010 applied because when the hIPv4 framework is fully implemented the global uniqueness of the IPv4 addresses are no longer valid and a more regional address allocation policy of IPv4 addresses can be deployed as discussed in appendix A. The ELOC prefix will only be used for routing and forwarding purposes inside the local and remote ALOC realms, the ELOC prefix is not used in the intermediate ALOC realms. When an initiator is establishing a session to a responder residing outside the local ALOC realm the destination address field in the IP header of an outgoing packet is no longer the remote destination address (ELOC prefix) - instead the remote ALOC prefix is installed in the destination address field of the IP header. Because the destination address is an ALOC prefix, the intermediate ALOC realms do not need to carry the ELOC prefixes of other ALOC realms in their RIB - it is sufficient for the intermediate ALOC realms to carry only the ALOC prefixes. Outcome is that the RIB and FIB tables at each ALOC realm will be reduced when the hIPV4 framework is fully implemented. The ALOC prefixes are still globally unique and must be installed in the DFZ - thus the service provider can't control the growth of the ALOC prefixes but she/he can control the amount of local ELOC prefixes in her/his local ALOC realm. When the hIPv4 packet arrives at the remote ALOC realm it will be forwarded to the nearest LSR, since the destination address is the remote ALOC prefix. When the LSR has swapped the hIPv4 header, the destination address is the remote ELOC, thus the hIPv4 packet will be forwarded to the final destination at the remote ALOC realm. An endpoint using an ELOC prefix can be attached simultaneously to two different ALOC realms without the requirement to deploy a classical multi-homing solution, for details see section 14. Next, how can we locate the remote ELOC (endpoint) and remote ALOC realm in Internet, also how to assemble the header of the hIPv4 packet? Another matter is that the addressing structure is no longer single dimensional; instead a second level has been added on top of the old one. It is obvious that the Domain Name System needs to support a new record type so that the ALOC information can be distributed to the endpoints. To construct the header of the hIPv4 packet either the endpoint or an intermediate node (e.g. a proxy) should be used. A proxy solution is complicated, the proxy needs to listen to DNS messages and a cache solution does have scalability issues. A better solution is to extend the current IPv4 stack at the endpoints so that the ALOC and ELOC elements are incorporated at the endpoint's stack, but backwards compatibility must be preserved. Most of the application will not be aware of the extensions - other applications, such as Mobile IP, SIP, IPsec AH and so on (see section Frejborg Expires January 2, 2011 [Page 7] Internet-Draft Hierarchical IPv4 Framework July 2010 15) will suffer and can not be used outside their ALOC realm when the hIPv4 framework is fully implemented - unless the applications are upgraded. The reason is that these applications are depending upon the underlying network addressing structure to e.g. identify an endpoint. 4. Definitions of terms Regional Internet Registry (RIR): Is an organization overseeing the allocation and registration of Internet number resources within a particular region of the world. Resources include IP addresses (both IPv4 and IPv6) and autonomous system numbers. Locator: A locator is a name for a point of attachment within the topology at a given layer. Objects that change their point of attachment(s) will need to change their associated locator(s). In the hIPv4 framework two types of locators have been defined, Area Locator (ALOC) and Endpoint Locator (ELOC). Global Locator Block (GLB): An IPv4 address block that is globally unique. Area Locator (ALOC): An IPv4 address (/32) assigned to locate an ALOC realm in Internet. The ALOC is assigned by a RIR to a service provider or a multi-homed enterprise. The ALOC is globally unique because it is allocated from the GLB. Endpoint Locator (ELOC): An IPv4 address assigned to locate an endpoint in a local network. The ELOC block is assigned by a RIR to a service provider or to an enterprise. The ELOC block is only unique in a geographical region or globally unique in a business area defined by the RIRs. The final policy of uniqueness shall be defined by the RIRs. ALOC realm: An area in the Internet with at least one attached Locator Swap Router (LSR), also an ALOC must be assigned to the ALOC realm. The RIB of an ALOC realm holds both local ELOC prefixes and global Frejborg Expires January 2, 2011 [Page 8] Internet-Draft Hierarchical IPv4 Framework July 2010 ALOC prefixes. An ALOC realm exchanges only ALOC prefixes with other ALOC realms. Locator Swap Router (LSR): A router or node which is capable to process the hIPv4 header; once the header is processed the LSR will forward the packet upon the IPv4 destination address. The LSR must have the ALOC assigned as its locator. Locator Header: A 4, 8, 12 or 16 byte field, inserted between the IPv4 header and transport protocol header Identifier: An identifier is the name of an object at a given layer; identifiers have no topological sensitivity, and do not have to change, even if the object changes its point(s) of attachment within the network topology. Session: Is a semi-permanent interactive information exchange between communicating devices that is established at a certain time and torn down at a later time Provider Independent Address Space (PI addresses): An IPv4 address block which is assigned by a Regional Internet Registries directly to an end user organization Provider Aggregatable Address Space (PA addresses): An IPv4 address block assigned by a Regional Internet Registry to an Internet Service Provider which can be aggregated into a single route advertisement 5. Mandatory extensions to current architectures (unicast) To implement the hierarchical IPv4 framework some basic rules are needed: 1. The DNS architecture must support a new extension, i.e. an A type Resource Record should be able to carry an ALOC prefix. Frejborg Expires January 2, 2011 [Page 9] Internet-Draft Hierarchical IPv4 Framework July 2010 2. The hIPv4 capable endpoint shall have information about the local ALOC value; the local ALOC value can be configured manually or provided via a new DHCP option. 3. A globally unique IPv4 address block shall be reserved; this block is called the Global Locator Block (GLB). A service provider can have one or several ALOC prefixes allocated from the GLB. 4. ALOC prefixes are announced via current BGP protocol to adjacent service providers and multi-homed enterprises, the ALOC prefixes are installed in the RIB of the DFZ. When the hIPV4 framework is fully implemented only ALOC prefixes are announced between the service providers and multi-homed enterprises. 5. A hIPv4 capable ALOC realm must have one or several LSRs attached to its realm. The ALOC prefix is configured as an anycast IP address on the LSR. The anycast IP address is installed to appropriate routing protocols in order to be distributed to the DFZ. 6. The IPv4 socket API at endpoints must be extended to support local and remote ALOC prefixes. The modified IPv4 socket API must be backwards compatible with the current IPv4 socket API. The outgoing hIPv4 packet must be assembled by the hIPv4 stack with the local IP address from the socket as the source address and the remote ALOC prefix as the destination address in the IP header. The local ALOC prefix is inserted in the ALOC field of the locator header. The remote IP address from the socket API is inserted in the ELOC field of the locator header. 6. The header of a hIPv4 packet 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Frejborg Expires January 2, 2011 [Page 10] Internet-Draft Hierarchical IPv4 Framework July 2010 |A|P|S|VLB|L| R | Protocol | LH Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Area Locator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Endpoint Locator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Private Locator Referral | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Version: 4 bits The Version field is identical to that of RFC 791. IHL: 4 bits Internet Header Length field is identical to that of RFC 791. Type of Service: 8 bits The Type of Service is identical to that of RFC 791. Total Length: 16 bits The Total Length field is identical to that of RFC 791. Identification: 16 bits The Identification field is identical to that of RFC 791. Flags: 3 bits The Flags field is identical to that of RFC 791. Fragment Offset: 13 bits The Fragment Offset field is identical to that of RFC 791. Time to Live: 8 bits The Time to Live field is identical to that of RFC 791. Protocol: 8 bits A new protocol number must be assigned for hIPv4. Frejborg Expires January 2, 2011 [Page 11] Internet-Draft Hierarchical IPv4 Framework July 2010 Header Checksum: 16 bits The Header Checksum field is identical to that of RFC 791. Source Address: 32 bits The Source Address field is identical to that of RFC 791. Destination Address: 32 bits The Destination Address field is identical to that of RFC 791. Options and Padding: Variable length The Options and padding field is identical to that of RFC 791. ALOC Realm Bit, A-bit: 1 bit When the source and destination endpoints reside in different ALOC realms, the A-bit is set to 1 and the Area and Endpoint Locator fields must be used in the locator header. When the A- bit is set to 0 the source and destination endpoints reside within the same ALOC realm, the Area and Endpoint Locator shall not be used in the locator header. Private Bit, P-bit: 1 bit The P-bit is set to 1 if the endpoint is using a private IP address [RFC1918] and has published either the private IP address to the public via DNS or discreetly to partners in order to create a bidirectional session model for NAT. When P-bit is set to 1, the Private Locator Referral field must be used in the locator header. Swap Bit, S-bit: 1 bit The initiating endpoint sets the S-bit to 0 of the hIPv4 packet. A LSR will set this bit to 1 when it is swapping the IP source and destination addresses of the IP header with the Area and Endpoint Locator of the locator header. Valiant Load-Balancing, VLB-bits: 2 bits The purpose of the Valiant Load-Balancing field is to provide a mechanism for multipath enabled transport protocols to request explicit paths in the network for subflows, which are component parts of a session between two endpoints. The subflow path request can be set as following: Frejborg Expires January 2, 2011 [Page 12] Internet-Draft Hierarchical IPv4 Framework July 2010 00: Latency sensitive application, only one single subflow (i.e. multipath not applied), shortest path through the network is requested. 01: First subflow, shortest path or Valiant Load-Balancing might be applied. 11: Next subflow(s), Valiant Load-Balancing should be applied Load-Balanced, L-bit: 1 bit The initiating endpoint must set the L-bit to zero. A Valiant Load- Balancing capable node can apply VLB switching for the session if the value is set to zero; if the value is set to 1 VLB switching is not allowed. When VLB switching is applied for the session the node must set the value to 1. Reserved, R-bits: 2 bits Reserved, must be zero Protocol: 8 bits The Protocol field is identical to that of RFC 791 Locator Header Checksum: 16 bits A checksum is calculated on the locator header only. The checksum is computed at the initiator, recomputed at the LSR and verified at the responder. The checksum algorithm is identical to that of RFC 791. Area Locator (optional): 32 bits An IPv4 address, the ALOC is assigned by a RIR to a service provider. The ALOC is globally unique because it is allocated from the GLB. Endpoint Locator (optional): 32 bits An IPv4 address, the ELOC block is assigned by a RIR to a service provider or to an enterprise. The ELOC block is only unique in a geographical region or globally unique in a business area defined by the RIRs. The final policy of uniqueness shall be defined by the RIRs. Private Locator Referral (optional): 32 bits Frejborg Expires January 2, 2011 [Page 13] Internet-Draft Hierarchical IPv4 Framework July 2010 A private IPv4 address [RFC1918] in order to create a bidirectional session model for NAT, i.e. the initiating endpoint can set the PLR value in its outgoing packets in order to traverse a middlebox that is installed in front of the destination endpoint that is using a private IPv4 address as its locator. The middlebox can also contain a local private-public mapping scheme and thus the PLR field do not need to be filled with a private IPv4 address. 7. ALOC use cases Since ALOC is the main component that has been added to the current IPv4 framework and enabled the hIPv4 framework several ALOC use cases are explored in this section. As mentioned in previous sections ALOC is describing an area in Internet, the area can span over several Autonomous Systems or if the area is equal to an AS you can say that the ALOC is an AS locator. When the ALOC is describing an area it is hereafter called an anycast ALOC. The ALOC can also be used to describe a specific node between two ALOC realms, e.g. a node installed between a private and an ISP ALOC realm or between two private ALOC realms. In this use case the ALOC is describing an attachment point, e.g. where a private network is attached to Internet - the ALOC type is hereafter called a node ALOC. The main difference between anycast and node ALOC types is that in anycast ALOC scenarios ELOC routing information is shared between the attached ALOC realms. In a node ALOC scenario no ELOC routing information is shared between the attached ALOC realms. Node ALOC functionalities shall not be deployed between private and ISP ALOC realms - it would require too many locators from the GLB space - instead node ALOC functionality shall be used to separate private ALOC realms. ALOC space is divided into two types, a globally unique ALOC space (GLB) that is installed in DFZ and private ALOC space that is used inside private networks. Private ALOC are using the same locator space as defined in [RFC1918], private ALOC must be unique inside the private network and should not overlap with private ELOC values. Only ISP should be allowed to apply for global ALOC spaces, for further discussions see appendix A. The ISP should aggregate ALOC spaces as much as possible in order to reduce the size of the routing table in DFZ. Enterprises should only make use of private ALOC spaces, residential users will be using the ALOC value(s) from the ISP they are connected to. When a user logs on the enterprise's network the endpoint will receive via DHCP (or manually configured) the following locator values: Frejborg Expires January 2, 2011 [Page 14] Internet-Draft Hierarchical IPv4 Framework July 2010 o one ELOC value for each network interface o one private ALOC value if e.g. the enterprise network spans over a small region or o several private ALOC values if e.g. the enterprise network spans over several regions (i.e. uses long distance connections) or recently have been merged with another enterprise o one or several global ALOC values, these ALOC describes how the enterprise network is connected to Internet As the user establishes a session to a remote endpoint DNS is usually used to resolve remote locator values, DNS will return for the remote endpoint the ELOC and ALOC values. If no ALOC values are returned a legacy IPv4 session is initiated to the remote endpoint. When ALOC values are returned the initiating endpoint is comparing the ALOC values with its own given ALOC values (that are provided via DHCP or manually configured): o if the remote ALOC value(s) is from the private ALOC space the initiator shall use the given private ALOC value(s) for the session. Two use cases exist to design a network to use private ALOC functionality. The remote endpoint is far away and in order to improve performance for the session a multipath transport protocol should be used. The other use case is when the remote endpoint resides in a network that recently has been merged and private ELOC [RFC1918] spaces overlaps if no renumbering is applied. One or several node ALOC solutions are needed in the network between the initiator and responder. For long distance sessions with no overlapping ELOC values, anycast or node ALOC solutions can be deployed. Third use case follows, again the initiator compares returned ALOC values from DNS with own given ALOC values: o if the remote ALOC value(s) is from the global ALOC space and the remote ALOC doesn't match the given global ALOC the initiator shall use the given global ALOC value(s) for the session. In this use case the remote endpoint resides outside the enterprises own private network, the global remote ALOC values indicates how the remote network is attached to Internet. When a multipath transport protocol is used the subflows can be routed via separate border routers to the remote endpoint - both at the local and remote border Frejborg Expires January 2, 2011 [Page 15] Internet-Draft Hierarchical IPv4 Framework July 2010 routers if both sites are multi-homed. The egress hIPv4 packet (the initiator's packets in the local network) can be identified by the protocol value in the IP header, routed to an explicit path (e.g. MPLS LSP, L2TPv3 tunnel etc) upon the ALOC value in the locator header - a simple first mile traffic engineering topology can be designed, for further details, see section 12 and appendix B. Fourth use case is to leverage the private and global ALOC functionalities to be aligned with the design and implementation of split-DNS solutions. The fifth use case is for residential users, a residential user may use one or several ALOC values; it depends upon the service offer and network design of the ISP. If the ISP prefers to offer advanced support for multipath transport protocols and first mile traffic engineering, then the residential user might be provided with several ALOC values. The ALOC provided for residential users is taken from the global ALOC space and anycast ALOC functionality is applied. If the ISP supports multipath transport protocols with first mile traffic engineering capabilities the ISP network needs to design explicit paths based upon the ALOC value in the locator header. Later on, when the routers have evolved and been replaced, the routers might forward packets to explicit peering points based upon the ALOC value in the locator header - if the ALOC values matches the ALOC realm's ALOC value. An alternative is to implement Valiant Load- Balancing [VLB] in the ISP network, for further details, see section 12. 8. Life of a hIPv4 session This section provides an example of a hIPv4 session between two hIPv4 endpoints; an initiator and a responder residing in different ALOC realms. When the hIPv4 stack is assembling the packet for transport the hIPv4 stack shall decide if a legacy IPv4 or a hIPv4 header is used upon the ALOC information received by a DNS reply. If the initiator's (local) ALOC value equals the responder's (remote) ALOC value there is no need to use the hIPv4 header for routing purposes, because both the initiator and responder reside in the local ALOC realm. The packet is routed upon the IP header since the packet will not exit the local ALOC realm. When the local ALOC prefix doesn't match the remote ALOC prefix a hIPv4 header must be assembled because the packet needs to be routed to a remote ALOC realm. A session between two endpoints inside an ALOC realm might use the locator header - not for routing purposes, but to make use of Valiant Frejborg Expires January 2, 2011 [Page 16] Internet-Draft Hierarchical IPv4 Framework July 2010 Load-Balancing [VLB] for multipath enabled transport protocols or to create a bidirectional session model for NAT. The initiator can add the locator header to the packet and by setting the VLB bits to 01 indicating to the responder and intermediate routers that VLB is requested for the subflow. Because this is an intra-ALOC realm session there is no need to add ALOC and ELOC fields to the locator header, thus the size of the locator header will be 4 bytes - or 8 bytes if the Private Locator Referral is also used. How a hIPV4 session is established follows: 1. The initiator queries the DNS node; the hIPv4 stack notice that the local and remote ALOC doesn't match and therefore must use the hIPv4 header for the session. The hIPv4 stack of the initiator must assemble the packet in the following way: a. set local IP address from API in the source address field of the IP header b. set remote IP address from API in the ELOC field of the locator header c. set local ALOC prefix in the ALOC field of the locator header d. set remote ALOC prefix in the destination address field of the IP header e. set the transport protocol value in the protocol field of the locator header and set the hIPv4 protocol value in the protocol field of the IP header f. set the desired parameters in the A-, P-, S-, VLB-, L-, and R- fields of the locator header g. calculate IP-, locator- and transport header checksums, transport header calculation do not include the locator header fields. When completed the packet is transmitted. 2. The hIPv4 packet is routed throughout Internet upon the destination address of the IP header. 3. The hIPv4 packet will reach the closest LSR of the remote ALOC realm. When the LSR notice that the packet matches the given local ALOC value the LSR must: a. verify the received packet that it uses the hIPv4 protocol value in the protocol field of the IP header Frejborg Expires January 2, 2011 [Page 17] Internet-Draft Hierarchical IPv4 Framework July 2010 b. verify IP-, locator- and transport header checksums, transport header verification do not include the locator header fields c. replace the source address in the IP header with the ALOC value of the locator header d. replace the destination address in the IP header with the ELOC value of the locator header e. replace the ALOC value in the locator header with the destination address of the IP header f. replace the ELOC value in the locator header with the source address of the IP header g. set the S-field to 1 h. decrease TTL value with one i. calculate IP-, locator- and transport header checksums, transport header calculation do not include the locator header fields j. forward the packet upon the destination address of the IP header 4. The swapped hIPv4 packet is now routed inside the remote ALOC realm upon the new destination address of the IP header to the final destination. 5. The responder receives the hIPv4 packet a. the hIPv4 stack must verify the received packet that it uses the hIPv4 protocol value in the protocol field of the IP header b. verify IP-, locator- and transport header checksums, transport header verification do not include the locator header fields 6. The hIPv4 stack of the responder must present to the extended IPv4 socket API the following: a. present source address as the remote ALOC b. present destination address as the local IP address c. verify the received ALOC as the local ALOC Frejborg Expires January 2, 2011 [Page 18] Internet-Draft Hierarchical IPv4 Framework July 2010 d. present ELOC as the remote IP address 7. The responder's application will respond to the initiator and the returning packet will take almost the same steps, which are steps 1 to 6, as when the initiator started the session. In step 1 the responder doesn't need to do a DNS lookup since all information is provided by the packet. 9. Overlapping Source and Destination ELOC prefixes/ports Because an ELOC prefix is only significant within the local ALOC realm there is a slight possibility that a session between two endpoints residing in separate ALOC realms might use the same source and destination ELOC prefixes. But the session is still unique because the two processes communicating over the transport protocol form a logical session which is uniquely identifiable by the five tuples involved, i.e. by the combination of . The session might no longer be unique when two initiators with the same source ELOC prefix residing in two separate ALOC realms are accessing a responder located in a third ALOC realm. In this scenario a possibility exists that the initiators will use the same local port value. This situation will cause an "identical session situation" for the application layer. To overcome this scenario the hIPv4 stack must accept only one unique session with the help of the ALOC information. If there is an "identical session situation" - i.e. both initiators uses the same values in the five tuples - the hIPv4 stack shall allow only the first established session to continue, the following sessions must be prohibited and the initiator is informed by ICMP notification about the "identical session situation". MPTCP introduces a token, which is locally significant and currently defined as 32 bit long. The token will provide a sixth tuple for future stacks to identify and verify the uniqueness of a session - the probability to have an "identical session situation" is further reduced. By adding Name-Based Sockets [NBS] to the hIPv4 framework the "identical session situation" is completely removed. 10. Traceroute considerations As long as the traceroute is executed inside the local ALOC realm normal IPv4 traceroute mechanism can be used. As soon as the traceroute exits the local ALOC realm the locator header shall be used in the notifications. Therefore extension to ICMP protocol shall be implemented, the extensions shall be compatible with [RFC4884]. Frejborg Expires January 2, 2011 [Page 19] Internet-Draft Hierarchical IPv4 Framework July 2010 11. Multicast considerations Since source and destination ELOC prefixes are only installed in the RIB of the local ALOC realm there is a constraint with Reverse Path Forwarding (RPF) which is used to ensure loop-free forwarding of multicast packets. The source address of a multicast group (S,G) is used against the RFP check. The source address can no longer be used as a RFP checkpoint outside the local ALOC realm. To enable RPF globally for a (S,G), the multicast enabled LSR (mLSR) must at the source ALOC realm replace the source address with the local ALOC prefix for inter-ALOC multicast streams. This can be achieved if the local LSR act also as an anycast Rendezvous Point with Multicast Source Discovery Protocol (MSDP) and Protocol Independent Multicast capabilities; with these functionalities the LSR becomes a multicast enabled LSR (mLSR). The sender register at the mLSR and a source tree is established between the sender and the mLSR. When an inter-ALOC realm receiver subscribes to the multicast group the mLSR have to swap the hIPv4 multicast packet in the following way: a. verify the received packet that it uses the hIPv4 protocol value in the protocol field of the IP header b. verify IP- and transport header checksums c. replace the source address in the IP header with the local ALOC value d. set the S-field to 1 e. decrease TTL value with one f. calculate IP-, and transport header checksums, transport header calculation do not include the locator header fields g. forward the packet to the shared multicast tree In order for the mLSR to function as described above the sender must assemble the multicast hIPv4 packet in the following way: a. set local IP address (S) from API in the source address and the ELOC field b. set remote IP address (G) from API in the destination address field Frejborg Expires January 2, 2011 [Page 20] Internet-Draft Hierarchical IPv4 Framework July 2010 c. set local ALOC value in the ALOC field d. set the transport protocol value in the protocol field of the locator header and set the hIPv4 protocol value in the protocol field of the IP header e. set the desired parameters in the A-, P-, S-, VLB-, L-, and R- fields of the locator header f. calculate IP-, locator- and transport header checksums, transport header calculation do not include the locator header fields. When completed the packet is transmitted. The downstream routers from the mLSR to the receiver will use the source address (which value is the source ALOC prefix after the mLSR) in the IP header for RPF verification. In order for the receiver to create RTCP receiver reports all information is provided in the hIPv4 header of the packet. Because Source Specific Multicast (SSM) and IGMPv3 uses IP addresses in the payload both protocols needs to be modified to support the hIPv4 framework. 12. Traffic engineering considerations When hIPv4 framework is fully implemented ingress load balancing to an ALOC realm can be influenced by the placement of LSRs at the realm; a LSR provides a "nearest routing" scheme. Also, if RIR policies allows, a service provider can have several ALOC assigned, hence traffic engineering and filtering can be done with the help of ALOC prefixes. E.g. sensitive traffic can be aggregated under one ALOC prefix which is not fully distributed into the DFZ of Internet. If needed an ALOC Traffic Engineering solution between ALOC realms might be developed, i.e. create explicit paths that can be engineered via specific ALOC prefixes, e.g. create mechanism similar as described in Pathlet routing [PR]. Further studies are needed; first it should be evaluated if there is demand for such a solution. A use case of the ingress load balancing to an ALOC realm can be described as a last mile traffic engineering solution for a multi- homed site, i.e. ingress traffic flow to the site is influenced by how many attachment points to Internet the site uses and where the attachment points are placed at the local network. In order to apply egress load balancing (first mile traffic engineering) from the multi-homed site some new network nodes are needed between the initiator (connected to the local network) and the border routers. The new network node(s) shall be able to identify hIPv4 packets, Frejborg Expires January 2, 2011 [Page 21] Internet-Draft Hierarchical IPv4 Framework July 2010 based upon the protocol field in the IP header, and switch the packets to explicit paths based upon the ALOC value in the locator header. Together with a multipath transport protocol the subflows can be routed via specific attachment points, i.e. border routers sitting between the local network and Internet. Multi-homing becomes multi- pathing, for details, see appendix B. The usage of multipath enabled transport protocols opens up the possibility to develop a new design methodology of backbone networks, i.e. Valiant Load-Balancing [VLB]. If two single-homed endpoints, using multipath enabled transport protocols and attached to the network with only one interface/ELOC-prefix, are communicating over Internet, both subflows will most likely take the shortest path throughout Internet. I.e. both subflows are established over the same links and when there is congestion on a link or a failure of a link both subflows might simultaneously drop packets - the benefit of multipath is lost. The "subflows-over-same-links" scenario can be avoided if the subflows are traffic engineered to traverse Internet on different paths - but this is difficult to achieve by using classical traffic engineering, such as IGP tuning or MPLS based traffic engineering. By adding a mechanism to the locator header the "subflows-over-same-links" scenario might be avoided. If the LSR functionality is deployed on a Valiant Load-Balancing enabled backbone node - hereafter called vLSR - and the backbone nodes are interconnected via logical full meshed connections, Valiant Load- Balancing can be applied for the subflows. When a subflow has the appropriated bits set in the VLB-field of the locator header the first ingress vLSR shall do VLB switching of the subflow. That is, the ingress vLSR is allowed to do VLB switching of the subflow's packets if the VLB bits are set to 01 or 11, the S-bit is set to 0 and the local ALOC value of the vLSR matches the ALOC-field's value. If there are no ALOC and ELOC fields in the locator header, but the other fields' values are set as described above, the vLSR should apply VLB switching as well for the subflow - because it is an inter- ALOC realm subflow belonging to a multipath enabled session. With this combination of parameters in the locator header the subflow is VLB switched only at the first ALOC realm and most likely the subflows will be routed throughout the Internet on different paths. If VLB switching is applied in every ALOC realm it would most likely add too much latency for the subflows. The VLB switching at the first ALOC realm will not separate the subflows on the first and last mile links - if the subflows on the first and last mile links needs to be routed on separate links the endpoints should be deployed in a multi- homed environment. Studies on how Valiant Load-Balancing is influencing on traffic patterns between interconnected VLB [iVLB] backbone networks has been carried out. Nevertheless, more studies are needed around Valiant Load-Balancing scenarios. Frejborg Expires January 2, 2011 [Page 22] Internet-Draft Hierarchical IPv4 Framework July 2010 13. Large encapsulated packets Adding the locator header to an IPv4 packet in order to create a hIPv4 packet will increase the size of it but since the packet is assembled at the endpoint it will not add complications of current Path MTU Discovery (PMTUD) mechanism in the network. The intermediate network between two endpoints will not see any difference in the size of packet; IPv4 and hIPv4 packet sizes are the same from the network point of view. 14. Mobility considerations This section will consider two types of mobility solutions, site mobility and endpoint mobility. Site mobility definition: a site wishes to changes its attachment point to the Internet without changing its IP address block Today, classical multi-homing is the most common solution for enterprises that wishes to achieve site mobility. Multi-homing is one of the key findings behind the growth of the DFZ RIB, see the [IAB report], sections 2.1 and 3.1.2. The hIPv4 framework can provide a solution for enterprises to have site mobility without the requirement of implementing a classical multi-homed solution. This new multi-homed solution utilizing PI addresses is depended upon the forthcoming ELOC allocation policy which is discussed in appendix A. If the regional based ELOC allocation policy is enforced the enterprise can be concurrently attached to two different Internet service providers (ISP) without the need to implement AS border routing. The ISPs provide their globally unique ALOC prefixes for the enterprise and the ELOC block of the enterprise is regionally unique, a PI ELOC block. The enterprise can change on per endpoint basis the local ALOC prefix, i.e. from the previous ISP's ALOC prefix to the new ISP's ALOC prefix. Sessions initiated at the enterprise needs to be routed to the correct ISP, i.e. the border router (or intermediate routers between the initiator and border router) of the enterprise needs to apply explicit path routing upon the ALOC prefix in the locator header. For sessions initiated from the Internet the DNS record for an endpoint needs to be updated, also the local ALOC prefix on the endpoint needs be changed to achieve a symmetric path. Since the border router is enforcing explicit path routing upon the ALOC prefix of the locator header the responder can apply basic session load balancing over the two ISPs based upon from which ISP the session has been initiated, i.e. if the responder have two valid DNS records with two different ALOC prefixes. Conclusion is that a Frejborg Expires January 2, 2011 [Page 23] Internet-Draft Hierarchical IPv4 Framework July 2010 single-homed enterprise can achieve smooth transition from one ISP to another by only changing the ALOC prefix on the endpoints and at DNS records - the local ELOC scheme remains intact. Also a single-homed enterprise can become multi-homed without implementing AS border routing or to have an own ALOC prefix assigned. If a better session load balancing scheme is required the application should be migrated to a multipath enabled transport protocol such as [SCTP] or [MPTCP]. Multi-homing is discussed in detail in appendix B. Endpoint mobility definition: an endpoint moves relatively rapidly between different networks, changing its IP layer network attachment point Mobile IP [MIP] is used today for endpoints in order to provide mobility. Mobile IP is an overlay protocol; it is also an application that uses IP addresses in its payload. It is obvious that hIPv4 extensions need to be added to the MIP framework. Another approach is to investigate what [MPTCP] can offer to solve endpoint mobility scenarios. MPTCP introduces a token, which is locally significant and currently defined as 32 bit long. The token will provide a sixth tuple to identify and verify the uniqueness of a session. This sixth tuple - the token - is not depended upon the underlying layer, i.e. the IP layer. The session is identified with the help of the token and thus the application is not aware when the IP parameters are changed, e.g. during a roaming situation - but it is required that the application is not making use of IP addresses. Security issues arise; the token can be capture during the session by e.g. a man-in- the-middle attack. If the application requires protection against man-in-the-middle attacks the user should apply Transport Layer Security [TLS] Protocol for the session. To summarize, the most common endpoint mobility use case today is, that one endpoint resides in the fixed network and the other endpoint is mobile - thus MPTCP will provide roaming capabilities for the mobile endpoint - if the both endpoints are making use of the MPTCP extension. However, in some use cases the fixed endpoint needs to initialize a session to a mobile endpoint. Thus MIP should incorporate the hIPv4 extension - MIP is providing a rendezvous service for the mobile endpoints. Also many applications are providing rendezvous services for their users, e.g. SIP, peer-to- peer, Instant Messaging services etc. A generic rendezvous service solution is provided by [HIP], if desired HIP can be integrated to the hIPv4 framework. Once the mobile endpoints have located each other the mobile endpoints can change attachment points by leveraging the session identifier (token) in MPTCP - the routing architecture is only providing location information for the endpoints, i.e. the Frejborg Expires January 2, 2011 [Page 24] Internet-Draft Hierarchical IPv4 Framework July 2010 identifier mechanisms (session and host identifiers) are decoupled from the routing architecture. 15. Affected Applications and Implications There are several applications that are inserting IP address information in the payload of a packet. Some applications use the IP address information to create new sessions or for identification purposes. This section is trying to list the applications that need to be enhanced; however, this is by no means a comprehensive list. The applications can be divided in four main categories: o Applications based on raw sockets, a raw socket is receiving packets containing the complete header in comparison to the other sockets that only receives the payload. o Applications needed to enable the hIPv4 framework, i.e. DNS and DHCP databases which must be extended to support ALOC prefixes. o Applications that insert IP addresses to the payload and uses the IP address for setting up new sessions or for some kind of identification. The application belonging to this category can not set up sessions to other ALOC domains until extensions have been incorporated. Within the local ALOC domain there are no restrictions since the current IPv4 scheme is still valid. The following applications have been identified: o SIP; IP addresses are inserted in the SDP, Contact and Via header o Mobile IP; the mobile node uses several IP addresses during the registration process o IPsec AH; designed to detect alterations at the IP packet header o RSVP; RSVP messages are sent hop-by-hop between RSVP-capable routers to construct an explicit path o ICMP; notifications needs to be able to incorporate ALOC information and assemble the hIPv4 header in order to be routed back to the source o Source Specific Multicast; the receiver must specify the source address of the sender o IGMPv3; a source-list is included in the IGMP reports Frejborg Expires January 2, 2011 [Page 25] Internet-Draft Hierarchical IPv4 Framework July 2010 o Applications related to security, such as firewalls, must be enhanced to support ALOC prefixes o Applications that will function with FQDN but many uses an IP addresses instead, such as ping, traceroute, telnet and so on. The CLI syntax needs to be upgraded to support ALOC and ELOC information via the extended socket API. 16. The Future Role of the LSR The LSR was added to the framework in order to provide a smooth transition from the current IPv4 framework to the hIPv4 framework, i.e. a major forklift of the current forwarding plane is avoided by the introduction of the LSR element. In the future, the LSR can be left as such in the network, if preferred, or the LSR functionality can be expanded towards the edge when routers are upgraded due to their natural lifecycle process. Once an upgrade of a router is required because of e.g. increased demand for bandwidth, the modified forwarding plane might support concurrently IPv4 and hIPv4 forwarding - and the LSR functionality can be pushed towards the edge (the ultimate goal is to have LSR functionality integrated in the endpoints). This is accomplished by adding extension to the current routing protocols, both IGP and BGP. When a LSR receives a hIPv4 packet where the destination IPv4 address matches the local ALOC prefix the LSR shall - contrary to the tasks defined in section 8, step 4 - lookup the ELOC field in the locator header and compare this value against the FIB. If the next-hop entry is LSR capable the packet shall be forwarded upon the ELOC value. If the next-hop is a legacy IPv4 router the LSR must apply the tasks defined in section 8, step 4 and once completed forward the packet upon the new IPv4 destination address. Once the routers from the first ingress LSR to the final destination endpoint is upgraded to support hIPv4 forwarding there exist no longer a need to implement LSR functionality in the network of the remote ALOC realm, the packet is forwarded as such to the endpoint's extended stack. The hIPv4 stack must check that the ELOC value matches its local IPv4 address, because the destination IPv4 address matched the local ALOC prefix. Then the hIPv4 stack of the destination must present to the extended IPv4 socket API the following: a. present source address as the remote IP address b. present the destination address field value as the local ALOC c. present the ALOC field value as the remote ALOC Frejborg Expires January 2, 2011 [Page 26] Internet-Draft Hierarchical IPv4 Framework July 2010 d. present the ELOC field value as the local IP address Multicast LSR (mLSR) functionality remains in the network; it is an extension to the anycast RP with MSDP element. For sessions inside the ALOC domain legacy IPv4 forwarding plane is kept in place. 17. Transition considerations The hIPv4 framework is not introducing any new protocols that would be mandatory to carry out the transition from IPv4 to hIPv4; instead extensions are added to existing protocols - the hIPv4 framework requires extensions to the current IPv4 stack, mapping system and to some applications that use IP addresses in the payload but the current forwarding plane in Internet remains intact apart from that a new forwarding element (the LSR) is required to create an ALOC realm. Extensions to the IPv4 stack, mapping system and applications that uses IP addresses in the payload and routers can be deployed in parallel with the current IPv4 framework. Even genuine hIPv4 sessions can be established between endpoints though the current single dimensional Internet structure is still present. When will the single dimensional routing architecture then be upgraded to a two level architecture? The author thinks there are two possible tipping points: o When the RIB of DFZ is getting close to the capabilities of current forwarding plane - who will pay for the upgrade? Or will the service provider only accept ALOC prefixes from other service providers and avoid capital expenditures? o When the depletion of IPv4 addresses is causing enough problems for service providers and enterprises The biggest risk why hIPv4 framework will not succeed is the short timeframe before the expected depletion of the IPv4 address space occurs. Also, will enterprise give up their global allocation of the current IPv4 address block they have gained? Another risk is, will the enterprises and residences carry out an upgrade of their endpoints and security nodes? Transitions arguments and methods are discussed in appendix D and E. 18. Security Considerations Hijacking of a single ELOC prefix by longest match from another ALOC realm is no longer possible since the prefixes are separated by a locator, the ALOC. To apply a hijack of a certain ELOC prefix the whole ALOC realm must be routed via a bogus ALOC realm. Studies Frejborg Expires January 2, 2011 [Page 27] Internet-Draft Hierarchical IPv4 Framework July 2010 should be carried out with the Secure Inter-Domain Routing (SIDR) workgroup if the ALOC prefixes can be protected from hijacking. 19. IANA Considerations TBD 20. Conclusion This document provides a high level overview of the hierarchical IPv4 framework which could be build in parallel with the current single dimensional Internet by implementing extensions at several architectures. Implementation of the hIPv4 framework will not require a major service window break in the Internet, neither at the private networks of enterprises. Basically, the hIPv4 framework is an evolution of the current IPv4 framework. For sessions inside an ALOC realm the IPv4 framework can be used in the future and for inter-ALOC realm sessions the hIPv4 framework is needed. Though there is a long journey ahead and many things that need to be sorted out the hierarchical IPv4 framework looks promising. The transition can be attractive for the enterprises since the hIPv4 framework doesn't create a catch-22 situation, it introduces functionalities (related to site and endpoint mobility, better long distance performance, new tools to design and implement split-DNS architectures) that could better serve their business models, introduce less expensive multi- homing solutions, it slows down the expected growth of Internet's carbon footprint and it is inline with the Corporate Social Responsibility programs that many enterprises have implemented. The framework should also be interesting for the service providers, when the transition phase is completed the growth of DFZ will be controlled by the service providers and only the service providers - multi-homed enterprises might not influence on the RIB size of the DFZ anymore. After the transition the RIB size of the DFZ will be reduced, which should have a decreasing effect on the expected cost structure of future DFZ routers, both operating and capital expenditure. 21. References 21.1. References [IAB report] Meyer, D., Zhang, L., Fall, K., "Report from the IAB Workshop on Routing and Addressing", RFC 4984, September 2007 Frejborg Expires January 2, 2011 [Page 28] Internet-Draft Hierarchical IPv4 Framework July 2010 [MPLS] Rosen, E., Tappan, D., Fedorkow, G., Rekther, Y., Farinacci, D., Li, T., Conta, A., "MPLS Label Stack Encoding", RFC 3032, January 2001 [RFC4884] Bonica, R., Gan, D., Tappan, D., Pignataro, C. "Extended ICMP to support Multi-Part Messages", RFC 4884, April 2007 [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.J., Lear, E. "Address Allocation for Private Internets", RFC 1918, February 1996 [HIP] Moskowitz, R., Nikander, P. "Host Identity Protocol (HIP) Architecture", RFC 4423, May 2006 [SCTP] Stewart, R. "Stream Control Transmission Protocol", RFC 4960, September 2007 [MIP] Perkins, C. "IP Mobility Support for IPv4", RFC 3344, August 2002 [EIP] Wang, Z. "The Extended Internet Protocol", RFC 1385, November 1992 [TLS] Dierks, T., Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995 [DNSSEC] Arends, R., Austein, R., Larson, M., Massey, D., Rose, S., "DNS Security Introduction and Requirements", RFC 4033, March 2005 21.2. Informative References [RBridge] Perlman, R., "RBridges, Transparent Routing", Infocomm, 2004 [Dagstuhl] Arkko, J., Braun, M.B., Brim, S., Eggert, L., Vogt, C., Zhang, L., "Perspectives Workshop: Naming and Addressing in a Future Internet", Dagstuhl, 2009 [Nimrod] Chiappa N., "A New IP Routing and Addressing Architecture", 1991 Frejborg Expires January 2, 2011 [Page 29] Internet-Draft Hierarchical IPv4 Framework July 2010 [MPTCP] Ford, A., Raiciu, C., Handley, M., Barre, S., "TCP Extensions for Multipath Operation with Multiple Addresses", IETF TSV Area, July 2009 [VLB] Zhang-Shen, R., McKeown, N., "Designing a Predictable Internet Backbone with Valiant Load-Balancing", Stanford University [iVLB] Babaioff, M., Chuang, J., "On the Optimality and Interconnection of Valiant Load-Balancing Networks", University of California at Berkeley [RRG] RRG, "IRTF Routing Research Group Home Page", http://tools.ietf.org/group/irtf/trac/wiki/RoutingResearchG roup [BFD] Bidirectional Forwarding Detection Workgroup, http://www.ietf.org/dyn/wg/charter/bfd-charter.html [CES] Jen, D., Meisel, M., Yan, H. Massey, D., Wang, L., Zhang, B., Zhang, L., "Towards A New Internet Routing Architecture: Arguments for Separating Edges from Transit Core", Sigcomm [ILNP] Atkinson, R. "ILNP Concept of Operations", IRTF Routing RG [NBS] Vogt, C. "Simplifying Internet Applications Development With A Name-Based Sockets Interface", christianvogt [PR] Godfrey, P.G, Shenker, S., Stoica, I., "Pathlet Routing", Sigcomm 22. Acknowledgements The author would like to acknowledge Aki Anttila, Antti Jarvenpaa and Robin Whittle for giving helpful feedback on earlier versions of this document. Also the active participants at the Routing Research Group [RRG] mailing-list are acknowledged. The participants have provided ideas, proposals and discussions that have influenced on the architecture of the hIPv4 framework. Frejborg Expires January 2, 2011 [Page 30] Internet-Draft Hierarchical IPv4 Framework July 2010 Appendix A. Future IPv4 address allocation policies In this section we will discuss and study how the hIPv4 framework could influence on the IPv4 address allocation policies to ensure that the new framework will enable some re-usage of IPv4 address blocks. It is the Regional Internet Registries (RIRs) that shall define the final policies. When the hIPv4 framework is fully implemented every ALOC realm can have a full IPv4 address space - except the GLB - to allocate ELOC blocks from. There are some implications though. In order for an enterprise to achieve site mobility, i.e. to change service provider without changing its ELOC scheme, the enterprise should implement an Autonomous System (AS) solution with ALOC prefix at the attachment point to the service provider. Larger enterprises do have the resources to implement AS border routing; most of the large enterprises have already implemented multi-homing solutions. The small and midsize enterprises (SME) may not have the resources to implement AS border routing, or the implementation introduces unnecessary costs for the SME. Also if every enterprise needs to have an ALOC prefix it will have an impact on the RIB at the DFZ, the RIB will be populated with a huge amount of ALOC prefixes. It is clear that a compromise is needed. A SME is usually single- homed and the SME should be able to reserve a PI ELOC block from the RIR without the need to be forced to create an ALOC realm, i.e. implement a LSR solution and AS border routing. The PI ELOC block is no longer globally unique, the SME can only reserve the PI ELOC block for the region where it is active or has it attachment point to Internet. The attachment point rarely changes to another country; therefore it is sufficient that the PI ELOC block is regionally unique. When the enterprise is replacing its Internet service provider the enterprise do not have to change its ELOC scheme - only the local ALOC prefix at the endpoints are changed. The internal traffic at an enterprise does not use the ALOC prefix, the internal routing is applied by the IP header and thus the internal routing and addressing architectures are preserved. Mergers and acquisitions of enterprise can cause ELOC conflicts, because the PI ELOC block is hereafter only regionally unique. If an enterprise in region A overtakes an enterprise in region B there is a slight chance that both enterprises can have overlapping ELOC spaces. If overlapping of ELOC spaces occurs the private node ALOC functionality can be implemented - if all affected endpoints support the hIPv4 framework. Frejborg Expires January 2, 2011 [Page 31] Internet-Draft Hierarchical IPv4 Framework July 2010 Finally, residential users will receive only PA locators. When a residential user changes a service provider the residential user has to replace the locators. A PA ELOC block is no longer globally unique, every Internet service provider can use the PA ELOC blocks at their ALOC realms - the PA locators becomes kind of private locators for the service providers. The hIPv4 framework will provide re-usage of IPv4 address blocks, the current globally unique reservation of IPv4 address block shall be replaced by a regional allocation policy. Frejborg Expires January 2, 2011 [Page 32] Internet-Draft Hierarchical IPv4 Framework July 2010 Appendix B. Multi-homing becomes multi-pathing When the transition of the hIPv4 framework is fully completed the RIB of an ISP, that has created an ALOC realm, will have the following entries: o the PA ELOC blocks of directly attached customers (e.g. residential and enterprises) o the PI ELOC blocks of directly attached customers o the globally unique ALOC prefixes, received from other service providers The ISP will not carry in its RIB any PA or PI ELOC blocks from other service providers. In order to do routing and forwarding of packets between ISPs only ALOC information of other ISPs is needed. Then the question is, how to keep the growth of ALOC reasonable - if the enterprise is using PI addresses, having an AS number and implementing BGP, why not apply for an ALOC prefix? The classical multi-homing is causing the biggest impact on growth of the size of the RIB in the DFZ - replacing a /20 IPv4 prefix with a /32 ALOC prefix will not reduce the size of the RIB in the DFZ. Most likely the only way to prevent this from happening is to have a yearly cost for the allocation of an ALOC prefix - except if you are a service provider that are providing access and/or transit traffic for your customers. And it is granted to have cost for allocating an ALOC prefix for the non-service providers, because when an enterprise is using an ALOC prefix the enterprise is reserving a FIB entry throughout the DFZ - and the ALOC FIB entry needs to have power, space, hardware and cooling on all the routers in the DFZ. By implementing this kind of ALOC allocating policy it will reduce the RIB size in the DFZ quite well, multi-homing will no longer increase the RIB size of the DFZ. But this policy will have some impact on the resilience behavior, by compressing routing information we will loose visibility in the network. In today's multi-homing solutions the network always know where the remote endpoint resides, in case of a link or network failure a backup path is calculated and an alternative path is found - all routers in the DFZ are aware of the change in the topology. This functionality has off-loaded the workload of the endpoints; they only need to find the closest ingress router and the network will deliver the packets to the egress router, regardless of what failures (almost) happen in the network. And with Frejborg Expires January 2, 2011 [Page 33] Internet-Draft Hierarchical IPv4 Framework July 2010 the growth of multi-homed networks the routers in the DFZ have been forced to carry a greater workload, perhaps close to their limits - the workload between the network and endpoints is not in balance. Conclusion is that the endpoints should take more responsibilities for their sessions and that way off-loads the workload in the network. How, lets walkthrough an example: A remote enterprise has been given a PI block 192.168.1.0/24 (ELOC) that is either via static routing or BGP announced to the upstream service providers. The upstream service providers are providing the ALOC information for the enterprise, i.e. 10.1.1.1 and 10.2.2.2. A remote endpoint has been installed, it has been given ELOC 192.168.1.1 - the ELOC is a locator defining where the remote endpoint is attached to the remote network. The remote endpoint has been assigned ALOCs 10.1.1.1 and 10.2.2.2 - the ALOC have no forwarding functionality within the remote network - an ALOC is a locator defining the attachment point of the remote network to Internet. The initiator (local endpoint), which has ELOC 172.16.1.1 and ALOC prefixes 10.3.3.3 & 10.4.4.4, has established a session by using source ALOC 10.3.3.3 to the responder (remote endpoint) at ELOC 192.168.1.1 and ALOC 10.1.1.1,. I.e. both networks 192.168.10/24 and 172.16.1.0/24 are multi-homed. ALOC are not available in current IP stack's API but both ELOC are seen as the local and remote IP addresses in the API, so the application will communicate between IP addresses 172.16.1.1 and 192.168.1.1 - if ALOC values are included the session is established between 10.3.3.3:172.16.1.1 and 10.1.1.1:192.168.1.1. Next a network failure occurs, the link between the responder border router (BR-R1) and service provider that owns ALOC 10.1.1.1 goes down. The border router of the initiator (BR-I3) will not be aware of the situation, because only ALOC information is exchanged between service providers and ELOC information is compressed to stay within ALOC realms. But BR-R1 will notice the link failure; BR-R1 could rewrite the ALOC field in the locator header for this session, i.e. from 10.1.1.1 to 10.2.2.2 and send the packets to the second service provider via BR-R2 - the session between the initiator 10.3.3.3:172.16.1.1 and the responder 10.2.2.2:192.168.1.1 remains intact because the five legacy tuples at the IP stack API do not change - only the ALOC value of the responder has changed and this information is not shown to the application. An assumption here is that the hIPv4 stack does accept changes of ALOC values on the fly (more about this later). Frejborg Expires January 2, 2011 [Page 34] Internet-Draft Hierarchical IPv4 Framework July 2010 If the network link between the BR-I3 and ISP providing ALOC 10.3.3.3 fails, BR-I3 could rewrite the ALOC value in the locator header and route the packets via BR-I4 - and the session stays up. If there is a failure somewhere in the network the border routers might receive an ICMP destination unreachable message (if not blocked by security functionality) and thus try to switch to session over to the other ISP by replacing the ALOC values in the hIPv4 header. Or the endpoints might try themselves to switch to the other ALOCs after a certain time-out in the session. In all session transition cases the five legacy API tuples remains intact. If border routers or one of the endpoint changes the ALOC value without a negotiation with the remote endpoint security issues arises. Can the endpoint(s) trust the remote endpoint when ALOC value(s) are changed on the fly - is it still the same remote endpoint or has the session been hijacked by a bogus endpoint? The obvious answer is that an identification mechanism is needed to ensure that after a change in the path or a change of the attachment point of the endpoint the endpoints are still the same. An identifier needs to be exchanged during the transition of the session. Two types of identifiers have been discussed on the [RRG] mailing-list, session and host identifiers. The host identifier has the characteristics of a Public Key Infrastructure certificate solution. PKI solutions has been developed and deployed, thus it is recommended that PKI solutions should be used when an endpoint needs to be authenticated. When the ALOC value changes the PKI solution might need to re- authenticate the endpoints, it is up to the security experts to evaluate the risks and threats. When the security requirements are lower, e.g. browsing a web-site, a less complicated identification mechanism is preferable - it should be less complex to deploy and maintain. A session identifier will provide a low level security mechanism, offering some protection against hijacking of the session and also provide mobility. [SCTP] uses the verification tag to identify the association; [MPTCP] incorporates a token functionality for the same purpose - both can be considered to fulfill the characteristics of a session identifier. If the application requires protection against man-in-the-middle attacks the user should apply Transport Layer Security [TLS] Protocol for the session. Both transport protocols are also multipath capable. Implementing multipath capable transport protocols in a multi-homed environment will provide new capabilities such as: o concurrent redundant paths to the other endpoint via different ISPs Frejborg Expires January 2, 2011 [Page 35] Internet-Draft Hierarchical IPv4 Framework July 2010 o true dynamic load-balancing, the endpoints do not participate in any routing protocols or updating rendezvous solutions due to network link or node failures o only a single NIC on the endpoints is required o in case of a border router or ISP failure, the multipath transport protocol will provide resilience By adding more intelligence at the endpoints, i.e. multipath enabled transport protocols, the workload of the network is off-loaded and can take less responsibility for providing visibility of destination prefixes in Internet - i.e. prefix compression in the DFZ can be applied and only the attachment points of a local network needs to be announced in the DFZ. And the IP address space no longer needs to be globally unique; it is sufficient that only a part is globally unique, the rest is only regionally unique as discussed in appendix A. Outcome is that the current multi-homing solution can migrate towards a multi-pathing environment that will have the following characteristics: o AS number is not mandatory o BGP protocol is not mandatory at the enterprise's border routers, static routing with Bidirectional Failure Detection [BFD] is an option o allocation of global ALOC for the enterprise is not mandatory, upstream ISPs are providing the global ALOC prefixes for the enterprise o MPTCP provides dynamic load-balancing without using routing protocols, several paths can be simultaneously used and thus resilience is achieved o provide low growth of RIB entries at the DFZ o when static routing is used between the enterprise and ISP: o the RIB size at the enterprise's border routers are not depended upon the size of the RIB in DFZ nor adjacent ISPs o the enterprise's border router can not cause BGP churn in the DFZ or in the adjacent ISPs' RIB o when dynamic routing is used between the enterprise and ISP: Frejborg Expires January 2, 2011 [Page 36] Internet-Draft Hierarchical IPv4 Framework July 2010 o the RIB size at the enterprise's border routers are depended upon the size of the RIB in DFZ and adjacent ISPs o the enterprise's border router can cause BGP churn for the adjacent ISPs but not in the DFZ o the cost of border router should be less expensive that in today's multi-homing solution Frejborg Expires January 2, 2011 [Page 37] Internet-Draft Hierarchical IPv4 Framework July 2010 Appendix C. Mobile site crossing a RIR border Discussions regarding Network Address Translation, NAT, have been taking place on the [RRG] mailing-list. The outcome of the discussions are that NAT has become a de-facto part of the current Internet architecture - NAT has been so widely deployed that NAT can no longer be ignored as a temporary solution and thus NAT needs to be taken into account in the research work of a new routing architecture. Though the hIPv4 framework has the capabilities to reduce the usage of NAT, hIPv4 will not make NAT to be totally obsolete in the future. In the future there will still be use cases where NAT might be required, e.g. mobile vehicles that are crossing RIR boundaries and the vehicle (e.g. aircraft, train, ferry etc) carries a local network. If the RIR are setting up a locator allocation policy as discussed in appendix A, there are no longer globally unique locators, except one block (GLB) that is reserved to create the foundation of the DFZ. Locators from the GLB block can not be used for networks at mobile vehicles, nor might PI ELOC blocks be used if the vehicle crosses a RIR boundary. Enterprises could reserve a PI ELOC block in every region and that way create a globally unique locator block, again this scenario is depended upon the forthcoming RIR policies. Thus, most likely, a private locator block [RFC1918] needs to be assigned for a LAN enabled vehicle that is crossing regional borders. With this requirement in mind, mechanisms to ease the inbound NAT traversal challenges - e.g. sessions initiated from Internet to an endpoint, using a private locator [RFC1918], which is attached to a private network - is needed, i.e. the hIPv4 framework must provide a scalable bidirectional session model for NAT. Therefore, a private locator referral (PLR) mechanism has been added to the hIPv4 framework. The PLR mechanism is a local static global-private locator mapping relationship in a middlebox, sitting on the border between a private network and Internet. The mapping relationship can be published to the general public via DNS or only published discreetly to partners for e.g. business-to-business sessions. When DNS is used to publish the PLR a new type of DNS record is required. When an endpoint receives the value of the new DNS record it shall copy the value into PLR field of the locator header for the appropriate session - the A-record will contain the public locator of the middlebox. The middlebox, which is sitting in front of the remote endpoint, must have a mapping scheme, i.e. a table of private locator referral values that are associated with appropriate private locator of the endpoints inside the private network. Since the PLR field is 32-bit the locator can be published as such and no local mapping Frejborg Expires January 2, 2011 [Page 38] Internet-Draft Hierarchical IPv4 Framework July 2010 scheme is required on the middlebox, the private locator is carried within the PLR-field during the session. The middlebox must also be multipath capable, i.e. using multipath transport protocol to apply the transition of the session from one ALOC realm to another ALOC realm. The responder onboard the mobile site doesn't necessary need to make use of a multipath enabled transport protocol; the middlebox will act as a multipath proxy in front of the responder. Also the initiator doesn't need to make use of a multipath enabled transport protocol - if the DNS node is not on the mobile site and the middlebox can cache DNS messages on behalf of the initiator. It might become complicated, thus it is recommended that the initiator make use of multipath enabled transport protocols. During the transition the ELOC values for a session will not change, as discussed in appendix B, only ALOC value changes. Neither the initiator nor the responder at the mobile site need to setup new subflows during the transition phase, the middlebox needs to setup the subflows since it will discover when there is a new attachment point to Internet available - unless the middlebox informs the initiator and responder of the new attachment point, for that, a new protocol or an extension to ICMP is needed. Frejborg Expires January 2, 2011 [Page 39] Internet-Draft Hierarchical IPv4 Framework July 2010 Appendix D. Transition Arguments The media has announced several times the meltdown of Internet and the depletion of IPv4 addresses - but the potential chaos has been postponed several times and the general public has lost their interest in these announcements. Perhaps other approaches could be worthwhile to study, instead try to find other valuable arguments that the general public could be interested in, such as: o Not all endpoints needs to be upgraded, only those endpoints that are directly attached to the Internet. These kinds of endpoints are portable laptops, smart mobile phones, proxies, and DMZ/frontend endpoints. But the most critical endpoints, the backend endpoints where enterprises keep their most critical business applications do not need to be upgraded; the backend endpoints should not be reached at all from Internet - only from the Intranet - and this functionality can be achieved with the hIPv4 framework, since it is backwards compatible with the current IPv4 stack. o Mobility, it is estimated that the demand for applications that performs well over the wireless access network will increase. Introduction of MPTCP opens up a new possibility to create new solutions and applications that are optimized for mobility. The hIPv4 framework requires an upgrade of the endpoints' stack; if possible the hIPv4 stack should also contain MPTCP features. Applications designed for mobility could bring competitive benefits for the enterprises. o The intermediate routers in the network do not need to be upgraded (hardware), the current forwarding plane can still be used and the hIPv4 packet is capable to traverse most of the current NAT implementations. The benefit is that the current network equipment can be preserved at the service providers, enterprises and residences. That means that the carbon footprint is a lot lower compared to other solutions. Many enterprises do have green programs and many residential users are concerned with the global warming issue. Frejborg Expires January 2, 2011 [Page 40] Internet-Draft Hierarchical IPv4 Framework July 2010 o The migration from IPv4 to IPv6 (current defined architecture) will increase the RIB and FIB throughout DFZ, will it require a new upgrade of the forwarding plane as discussed in the IAB report is unclear. Most likely an upgrade is needed, the outcome of deploying IPv4 and IPv6 concurrently is that the routers need to have larger memories for the RIB and FIB - every globally unique prefix is installed in the routers that are participating in the DFZ. Since the enterprise is reserving one or several RIB/FIB entries on every router in the DFZ it means that the enterprise is increasing the power consumption of Internet, thus increasing the carbon footprint. And many enterprises are committed to green programs - if hIPv4 gets deployed, the power consumption of Internet will not grow as much as compared in an IPv4 to IPv6 transition scenario. o Another issue, if the migration from IPv4 to IPv6 (current defined architecture) occurs, is that the routers in the DFZ most likely need to be upgraded to more expensive routers - as discussed in the IAB report. In the wealthy part of the world, where a large penetration of Internet users is already present, the service provider can pass along more easily the costs of the upgrade to their subscribers - with a "wealthy/high penetration" ratio the cost will not grow that much that the subscribers would abandon Internet. But in the less wealthy part of the world, where there is usually a lower penetration of subscribers, the cost of the upgrade cannot that easily be covered - a "less wealthy/low penetration" ratio could have a dramatic increase on the cost that needs to be passed along to the subscribers. And thus fewer subscribers could afford to get connected to the Internet. For the global enterprises and the enterprises in the less wealthy part of the world, this scenario could mean less potential customers and there could be situations when the nomads of the enterprises can't get connected to Internet. This is also not fair; every human being should have a fair chance to be able to enjoy the Internet experience - and the wealthy part of the world should take this right into consideration. Many enterprises are committed to Corporate Social Responsibility programs. Not only technical and economical arguments can be found, also other arguments that the general public is interested in and concerned about can be found. Such arguments as that the Internet becomes greener and more affordable for everyone than compared to current forecast of the evolution of Internet. These non-technical values need to be communicated to the general public, you could ask them - do you care about the Planet, People and Internet? If you do, please upgrade the stack on your Internet enabled device. Frejborg Expires January 2, 2011 [Page 41] Internet-Draft Hierarchical IPv4 Framework July 2010 Appendix E. Integration with CES architectures Because the hIPv4 framework requires changes to the endpoints' stack it will take some time before the migration of the current IPv4 framework to a hIPv4 enabled routing architecture is fully completed. If a hIPv4 proxy solution could be used in front of legacy IPv4 endpoints, the threshold for early adopters to start to migrate towards the hIPv4 framework would be less questionable and the migration phase most likely would also be much shorter. Thus it should be investigated if the hIPv4 framework can be integrated with Core-Edge Separation [CES] architectures. In a CES architecture the endpoints do not need to be modified - the main design goal of a CES solution is to minimize the PI-address entries in the DFZ and to preserve the current stack at the endpoints. But a CES solution requires a new mapping system and will also introduce a caching mechanism in the map-and-encapsulate network nodes. Much debate about scalability of a mapping system and the caching mechanism has been carried out at the [RRG] list - today it is unclear how well both solutions will scale - research work of both topics are still in progress. Since the CES architectures are dividing the address spaces in two new categories - one that is installed in the RIB of the DFZ and the other that is installed in the local networks - there are to some degree similarities between CES architectures and the hIPv4 framework. In order to describe how these two architectures might be integrated some terminology definitions are needed: CES-node: A network node installed in front of a local network that must have the following characteristics o map-and-encapsulate ingress functionality o map-and-encapsulate egress functionality o incorporate the hIPv4 stack o routing functionality, [RFC1812] o be able to apply policy based routing on the ALOC field in the location header Frejborg Expires January 2, 2011 [Page 42] Internet-Draft Hierarchical IPv4 Framework July 2010 Note that the CES-node do not include the [MPTCP] extension, it would most likely put too much burden on the CES-node to signal and maintain MPTCP subflows for the cached hIPv4 entries. Consumer site: A site that is not publishing any services towards Internet, i.e. there are no entries in DNS for this site. This site is used by local endpoints to establish outbound connectivity, i.e. endpoints are initiating sessions from the site towards content sites. Usually this site is found at small enterprises and residencies. PA-addresses are usually assigned to this type of site. Content site: A site that is publishing services towards Internet, usually these services do have DNS entries. This site is used by local endpoints to establish both inbound and outbound connectivity. The large enterprises are using PI-addresses and the midsize/small enterprises are using either PI- or PA-address space. The CES architectures are aiming to reduce the PI-address entries in the DFZ, thus map-and-encapsulate egress functionality shall be installed in front of the content sites (map-and-encapsulate ingress functionality is required at the Internet Service Providers, ISP, but for the hIPv4-CES integration study the map-and-encapsulate ingress functionality at ISP are not interesting - but LSR functionality and provider map-and-encapsulate ingress functionality might reside in the same node). It is likely that the node containing map-and- encapsulate egress functionality will also contain map-and- encapsulate ingress functionality; it is most likely a router so the node just needs to support the hIPv4 stack and be able to apply policy-based routing upon the ALOC field of the locator header to become a CES-node. It is possible that the Large Content Providers (LCP) are not willing to install map-and-encapsulate functionality in front of their sites - if the caching mechanism is not fully reliable or if the mapping lookup delay do have an impact on their customers' end user experience then most likely the LCP will not adopt the CES architecture. In order to convince a LCP to adopt the CES architecture, it should provide a mechanism to mitigate the caching and mapping lookup delay risks. One method is to push the CES architectures to the edge - the closer to the edge you add new functionality the better it will scale, that is, if the endpoint stack is upgraded the caching mechanism is maintained by the endpoint itself. The mapping mechanism can be removed if the CES architecture's addressing scheme is Frejborg Expires January 2, 2011 [Page 43] Internet-Draft Hierarchical IPv4 Framework July 2010 replaced with the addressing scheme of hIPv4 when the CES solution is integrated at the endpoints. With this approach the LCP might install a CES-node in front of their sites; also some endpoints within the content site might be upgraded with the hIPv4 stack. If the LCP faces issues with the caching or mapping mechanisms the provider can ask its customers to upgrade their endpoints' stack to ensure a proper service level. At the same time the LCP promotes the migration from the current routing architecture to a new routing architecture, not for the sake of the routing architecture but instead to ensure a proper service level - you can say that a business model will promote the migration of a new routing architecture. The hIPv4 framework proposes that the IPv4 addresses (ELOC) should no longer be globally unique; once the transition is completed a more regional allocation can be deployed. But this is only possible once all endpoints (that are establishing sessions to other ALOC realms) have migrated to support the hIPv4 framework. Here the CES architecture can speed up the re-usage of IPv4 addresses, i.e. once an IPv4 address block has become an ELOC block it can be re-used within the other RIR regions - also without the requirement that all endpoints in Internet must first be upgraded. As said earlier the CES architecture is aiming to remove PI-addresses from the DFZ, thus the content sites are more or less the primary target for the roll-out of a CES solution. At large content sites a CES-node most likely will be installed - to upgrade all endpoints (that are providing services towards Internet) at a large content site will take time, it might be that the endpoints at the content site are upgraded only within their normal life-cycle process. But if the size of the content site is small the administrator either installs a CES-node or upgrades the endpoints' stack - a choice to be taken and the decision will be influenced by availability, reliability and economically feasibility. Once the content sites has been upgraded the PI-address entries has been removed from the DFZ. Most likely also some endpoints at the consumer sites have been upgraded to support the hIPv4 stack - especially if there have been issues with the caches or mapping delays that have influenced on the service levels at the LCPs. Then, how to keep track of the upgrade of the content sites - have they been migrated or not? If the content sites or content endpoint has been migrated the DNS records should have either a CES-node entry or ALOC entry for each A-record. When the penetration of CES solutions at content sites (followed up by CES-node/ALOC records in DNS) is high enough the ISP can start to promote the hIPv4 stack upgrade at the consumer sites. Once a PA-address block has been migrated it can be released from global allocation to a regional allocation. Why Frejborg Expires January 2, 2011 [Page 44] Internet-Draft Hierarchical IPv4 Framework July 2010 would then an ISP push its customer to migrate to hIPv4 stacks? It is due to the business model - it will be more expensive to stay in the current architecture. The depletion of IPv4 addresses will either cause more NAT at the service provider's network - operational expenditures will increase because the network will become more complex - or the ISP should force it's customers to migrate to IPv6 - but the ISP could loose customers to other ISPs that are offering IPv4 services. When PA-addresses has been migrated to the hIPv4 framework the ISP will have a more independent routing domain (ALOC realm) with only ALOC prefixes from other ISPs and ELOC prefixes from directly attached customers - BGP churn from other ISPs is no longer received, also the amount of alternative paths is reduced and the ISP can better control the growth of the RIB at their ALOC realm. The operational and capital expenditures should be lower that than in the current routing architecture. To summarize, the content providers might find the CES+hIPv4 solution attractive - it will remove the forthcoming IPv4 address depletion constraints without forcing the consumer to switch to IPv6, thus the content providers can continue to grow (reach more consumers). Also the ISP might find this solution attractive; it should reduce the capital and operational expenditures in long term. Both the content providers and the ISPs are providing the foundation of Internet - if both are adopting this architecture the consumers have to adopt, both providers might find business models to "guide" the consumers towards the new routing architecture. Then how will this affect the consumer and content sites? The residential users will need to upgrade their endpoints - but it doesn't really matter what IP protocol version they use - it is the availability and affordability of Internet that matters the most. Enterprises will be affected a little bit more. The edge devices at the enterprises local network needs to be upgraded - edge nodes such as AS border routers, forwarding proxies, security nodes, remote access service nodes, application delivery controllers, DNS, DHCP and public nodes needs to be upgraded - but by installing a CES-node in front of them, the upgrade process is postponed - instead the legacy nodes can be upgraded during their normal life-cycle process. The internal infrastructure is preserved as such, internal applications can still use IPv4 and also all investment in IPv4 skills is preserved. Frejborg Expires January 2, 2011 [Page 45] Internet-Draft Hierarchical IPv4 Framework July 2010 Walkthrough of use cases: 1. A legacy endpoint at a content site establishes a session to a content site with a hIPv4 upgraded endpoint When the legacy endpoint resolves the DNS entry for the remote endpoint (a hIPv4 upgraded endpoint) the legacy endpoint will receive an ALOC record in the DNS response. The legacy endpoint will ignore the ALOC record, only the A-record is used to establish the session. Next, the legacy endpoint will initialize the session and a packet is sent towards the map-and-encapsulate ingress node, which need to do a lookup at the CES mapping system (assumption here is that no cache entry exist for the remote endpoint). The mapping system returns either a CES-node prefix or an ALOC prefix for the lookup; since the requested remote endpoint has been upgraded the mapping system returns an ALOC prefix. The CES-node will not use the CES encapsulation scheme for this session, instead the hIPv4 header scheme shall be used and a /32 entry will be created in the cache. A /32 entry must be created; it is possible that not all endpoints at the remote site are upgraded to support the hIPv4 framework. The /32 cache entry can be replaced with a shorter prefix in the cache if all endpoints are upgraded at the remote site. To indicate this situation a sub-field shall be added for the ALOC record in the mapping system. The CES-node must execute the following steps for the egress packets a. verify IP- and transport header checksums b. create the locator header, copy the destination address to the ELOC field of the locator header c. replace the destination address in the IP header with the ALOC value given in the cache d. insert the local CES-node value in the ALOC field of the locator header e. copy the transport protocol value of the IP header to the protocol field of the locator header and set the hIPv4 protocol value in the protocol field of the IP header f. set the desired parameters in the A-, P-, S-, VLB-, L-, and R- fields of the locator header g. decrease TTL value with one Frejborg Expires January 2, 2011 [Page 46] Internet-Draft Hierarchical IPv4 Framework July 2010 h. calculate IP-, locator- and transport header checksums, transport header calculation do not include the locator header fields. When completed the packet is transmitted i. because the size of the packet might exceed MTU - due to insertion of the locator header - if MTU is exceeded the CES- node should inform the source endpoint with an ICMP message of the situation and the CES-node should apply fragmentation of the hIPv4 packet 2. A hIPv4 upgraded endpoint at a consumer/content site establishes a session to a content site with a CES-node in front of a legacy endpoint The hIPv4 upgraded endpoint receives in the DNS response either an ALOC record or a CES-node record for the resolved destination. From the requesting hIPv4 endpoint's point of view it really doesn't matter if the new record prefix is used to locate LSR-nodes or CES- nodes in Internet - the CES-node will act as a hIPv4 proxy in front of the remote legacy endpoint. Thus the hIPv4 endpoint assembles a hIPv4 packet to initialize the session, when the packet arrives at the CES-node it must execute the following: a. verify the received packet that it uses the hIPv4 protocol value in the protocol field of the IP header b. verify IP-, locator- and transport header checksums, transport header verification do not include the locator header fields c. replace the protocol field value of the IP header with the protocol field value of the locator header d. replace the source address in the IP header with the ELOC value of the locator header e. remove the locator header f. create a cache entry (unless an entry already exists) for returning packets, a /32 entry is required. To optimize the usage of cache entries, the CES-node might ask the CES mapping node if all endpoints at the remote site are upgraded or not - if upgraded a shorter prefix can be used in the cache. g. decrease TTL value with one h. calculate IP- and transport header checksums Frejborg Expires January 2, 2011 [Page 47] Internet-Draft Hierarchical IPv4 Framework July 2010 i. forward the packet upon the destination address of the IP header 3. A hIPv4 enabled endpoint with a regional unique ELOC at a consumer site establishes a session to a consumer site with a legacy endpoint. In this use case the sessions will fail unless some mechanisms is invented and implemented at the ISPs' map-and-encapsulate nodes. The sessions will work inside an ALOC realm since the legacy IPv4 framework is still valid, sessions between ALOC realms will fail. Some applications are establishing sessions between consumer sites; the most common are gaming and peer-to-peer applications. These communities have historically been in the forefront to adopt new technologies - it is expected that the affected communities either develop workarounds to solve this issue or simply asking their members to upgrade their stacks. 4. A legacy endpoint at a consumer/content site establishes a session to a content site with a CES-node in front of a legacy endpoint Assumed to be described in CES architecture drafts 5. A hIPv4 enabled endpoint at a consumer/content site establishes a session to a content site with a hIPv4 enabled endpoint See section 8 Author's Address Patrick Frejborg Email: pfrejborg@gmail.com Frejborg Expires January 2, 2011 [Page 48]