INTERNET-DRAFT Matt Crawford Fermilab Allison Mankin ISI Thomas Narten IBM John W. Stewart, III ISI Lixia Zhang UCLA July 30, 1997 Separating Identifiers and Locators in Addresses: An Analysis of the GSE Proposal for IPv6 Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this memo is unlimited. This Internet Draft expires January 30, 1997. Abstract On February 27-28, 1997, the IPng Working Group held an interim meeting in Palo Alto, California to consider adopting Mike O'Dell's 'GSE - An Alternate Addressing Architecture for IPv6' proposal [GSE]. In GSE, 16-byte IPv6 addresses are split into three portions: a globally unique End System Designator (ESD), a Site Topology draft-ietf-ipngwg-esd-analysis-01.txt [Page 1] INTERNET-DRAFT July 30, 1997 Partition (STP) and a Routing Goop (RG) portion. The STP corresponds (roughly) to a site's subnet portion of an IPv4 address, whereas the RG identifies the attachment point to the public Internet. Routers use the RG+STP portions of addresses (called 'Routing Stuff' in this document) to route packets to the link to which the destination is directly attached; the ESD is used to deliver the packet across the last hop link. An important idea in GSE is that nodes within a site do not know the RG portion of their addresses. A border router at the site's Internet connect point would dynamically replace the RG part of source addresses of all outgoing IP datagrams and the RG part of destination addresses on incoming traffic. This document provides a detailed analysis of the GSE plan. Much of the analysis presented here is an expansion of official meeting minutes, though it also includes issues uncovered by the authors in the process of fully fleshing out the analysis. In summary, the working group eventually decided that the full addresses of nodes within a site should not be hidden from those nodes, so as a result it is not necessary for routers to rewrite the Routing Goop portion of addresses. However, other parts of the GSE plan were adopted (e.g., having 64-bit interface identifiers with an option for specifying them as globally unique and easing the renumbering of the high-order portion of addresses within DNS). In addition to analyzing the GSE proposal in particular, the document also studies the general issue of separating network layer addresses into two separate values satisfying location and identification purposes, respectively. draft-ietf-ipngwg-esd-analysis-01.txt [Page 2] INTERNET-DRAFT July 30, 1997 Contents Status of this Memo.......................................... 1 1. Introduction............................................. 4 2. Addressing and Routing in IPv4........................... 5 2.1. The Need for Aggregation............................ 7 2.2. The Pre-CIDR Internet............................... 7 2.3. CIDR and Provider-Based Addressing.................. 8 2.4. Multi-Homing and Aggregation........................ 11 3. GSE Background........................................... 14 3.1. Motivation For GSE.................................. 14 3.2. GSE Address Format.................................. 15 3.3. Routing Stuff (RG and STP).......................... 15 3.4. End-System Designator............................... 17 3.5. Address Rewriting by Border Routers................. 18 3.6. Renumbering and Rehoming Mid-Level ISPs............. 19 3.7. Support for Multi-Homed Sites....................... 20 3.8. Explicit Non-Goals for GSE.......................... 21 4. Analysis of GSE's Advantages and Disadvantages........... 21 4.1. End System Designator............................... 21 4.1.1. Uniqueness Enforcement in the IPv4 Internet.... 21 4.1.2. Overloading Addresses: Network Layer Issues.... 23 4.1.3. Overloading Addresses: Transport Layer Issues.. 24 4.1.4. Potential Benefits of Globally Unique ESDs..... 25 4.1.5. ESD: Network Layer Issues...................... 26 4.1.6. ESD: Transport Layer Issues.................... 28 4.1.7. On The Uniqueness Of ESDs...................... 34 4.1.8. DNS PTR Queries................................ 35 4.1.9. Reverse Mapping of ESDs........................ 37 4.1.10. Reverse Mapping of Complete GSE Addresses..... 38 4.1.11. The ICMP "Who Are You" Message................ 39 4.2. Renumbering and Domain Name System (DNS) Issues..... 40 4.2.1. How Frequently Can We Renumber?................ 40 4.2.2. Efficient DNS support for Site Renumbering..... 41 4.2.3. Two-Faced DNS.................................. 42 4.2.4. Bootstrapping Issues........................... 43 4.2.5. Renumbering and Reverse DNS Lookups............ 44 4.3. Address Rewriting Routers........................... 44 4.3.1. Load Balancing................................. 45 4.3.2. End-To-End Argument: Don't Hide RG from Hosts.. 45 4.4. Multi-Homing........................................ 46 5. Results.................................................. 48 draft-ietf-ipngwg-esd-analysis-01.txt [Page 3] INTERNET-DRAFT July 30, 1997 6. Security Considerations.................................. 49 7. Acknowledgments.......................................... 49 8. References............................................... 49 9. Authors' Addresses....................................... 51 1. Introduction In October of 1996, Mike O'Dell published an Internet-Draft (dubbed "8+8") that proposed significant changes to the IPv6 addressing architecture. The 8+8 proposal was the topic of considerable discussion at the December 1996 IETF meeting in San Jose. Because the proposal offered both potential benefits (e.g., enhanced routing scalability) and risks (e.g., changes to the basic IPv6 architecture), the IPng Working Group held an interim meeting on February 27-28, 1997 to consider adopting the 8+8 proposal. The meeting, at which over 45 persons attended, was held at Sun Microsystems' PAL1 facility in Palo Alto, CA. Shortly before the interim meeting, an updated version of the Internet-Draft was produced, in which the name of the proposal was changed from "8+8" to "GSE," to identify the three separate components of the address: Global, site and End-System Designator. This last version of the GSE proposal was published as an Informational RFC [GSE] for historical purposes. The purpose of the meeting was to evaluate the GSE proposal and decide whether to adopt it in whole or in part or to reject it. The well-attended meeting generated high caliber, focused technical discussions on the issues involved, with participation by almost all of the attendees. By the middle of the second day there was unanimous agreement by the attendees that the GSE proposal as written presented too many risks and should not be adopted as the basis for IPv6. However, the attendees also concluded that some of the issues discussed in the GSE proposal were equally applicable to the current IPv6 provider-based addressing plan and had enough benefit to warrant further consideration apart from the GSE address format. These changes include: 1) Making changes to the IPv6 provider-based addressing document to facilitate increased aggregation. 2) Creating hard boundaries in IPv6 addresses to clearly distinguish between the portions used for identifying hosts and draft-ietf-ipngwg-esd-analysis-01.txt [Page 4] INTERNET-DRAFT July 30, 1997 for routing. 3) Having an option to indicate that the low-order 8 bytes of an IPv6 address is a globally unique End System Designator (ESD). This change has potential benefits to future transport protocols (e.g., TCPng). 4) Making a clear distinction between the "locator" part of an address and the "identifier" part of the address. The former is used to route a packet to its end-point, the latter is used to identify an end-point, independent of the path used to deliver the packet. 5) Making changes to the way AAAA records are stored within the DNS, so that renumbering a site (e.g., when a site changes ISPs) requires few changes to the DNS database in order to effectively change all of a site's address AAAA RRs. While this document does contain an analysis of the specific mechanisms of the GSE proposal, much of document's analysis applies to any proposal in which the identifying and locating properties of an address (which are combined in IPv4) are split apart into separable pieces. 2. Addressing and Routing in IPv4 Before dealing with details of GSE, we present some background about how routing and addressing works in "classical IP" (i.e., IPv4). We present this background because the GSE proposal proposes a fairly major change to the base model. In order to properly evaluate the benefits of GSE, one must understand what problems in IPv4 it alleges to improve or fix. The structure and semantics of a network layer protocol's addresses are absolutely core to that protocol. Addressing substantially impacts the way packets are routed, the ability of a protocol to scale and the kinds of functionality higher layer protocols can provide. Indeed, addressing is intertwined with both routing and transport layer issues; a change in any one of these can impact another. Issues of administration and operation (e.g., address allocation and required renumbering), while not part of the pure exercise of engineering a network layer protocol, turn out to be critical to the scalability of that protocol in a global and commercial network. The interaction between addressing, routing and especially aggregation is particularly relevant to this document, so some time will be spent describing it. draft-ietf-ipngwg-esd-analysis-01.txt [Page 5] INTERNET-DRAFT July 30, 1997 Addresses in IPv4 serve two purposes: 1) Unique identification of an interface. An IP address by itself identifies which interface a packet should be delivered to. 2) Location information of that interface. Routers extract location information from a packet's destination address in order to route it towards its ultimate destination. That is, addresses identify "where" the intended recipient is located within the Internet topology. For scalability, the location information contained in addresses must be aggregatable. In practice, this means nodes topologically close to each other (e.g., connected to the same link, residing at the same site, or customers of the same ISP) must use addresses that share a common prefix. What is important to note is that these identification and location requirements have been met through the use of the same value, namely the IP address. As will be noted repeatedly in this document, the "over-loading" of IPv4 addresses with multiple semantics has some undesirable implications. For example, the embedding of IPv4 addresses within transport protocol addresses that identify the end- point of a connection couples those transport protocols with routing. This entanglement is inconsistent with a strictly layered model in which routing would be a completely independent function of the network layer and not directly impact the transport layer. Combining locator and identifier functions also has the practical impact of complicating the support for mobility. In a mobile environment, the location of an end-station may change even though its identity stays the same; ideally, transport connections should be able to survive such changes. In IPv4, however, one cannot change the locator without also changing the identifier. Consequently, conventional wisdom for some time has been that having separate values for location and identification could be of significant benefit. The GSE proposal attempts to make such a separation. This document frequently uses mobility as an example to demonstrate the pros and cons of separating the identifier from the locator. However, the reader should note the fundamental equivalence between the problems faced by mobile hosts and the problem faced by sites that change providers yet don't want to be required to renumber their network. When a site changes providers, it moves (topologically) in much the same way a mobile node does when it moves from one place to another. Consequently, techniques that help (or hinder) mobility are often relevant to the issue of site renumbering. draft-ietf-ipngwg-esd-analysis-01.txt [Page 6] INTERNET-DRAFT July 30, 1997 2.1. The Need for Aggregation IPv4 has seen a number of different addressing schemes. Since the original specification, the two major additions have been subnetting and classless routing. The motivation for adding subnetting was to allow a collection of networks located at one site to be viewed from afar as being just one IP network (i.e., to aggregate all of the individual networks into one bigger network). The practical benefit of subnetting was that all of a site's hosts, even if scattered among tens or hundreds of LANs, could be represented via a single routing table entry in routers located far from the site. In contrast, prior to subnetting, a site with ten LANs would advertise ten separate network entries, and all routers would have to maintain ten separate entries, even though they contained redundant information.. The benefits of aggregation should be clear. The amount of work involved in computing forwarding tables from routing tables is dependent in part on the number of network routes (i.e., destinations) to which best paths are computed. If each site has 10 internal networks, and each of those networks is individually advertised to the global routing subsystem, the complexity of computing forwarding tables can easily be an order of magnitude greater than if each site advertised just a single entry that covered all of the addresses used within the site. 2.2. The Pre-CIDR Internet In the early days of the Internet, the Internet's topology and its addressing were treated as orthogonal. Specifically, when a site wanted to connect to the Internet, it approached a centralized address allocation authority to obtain an address and then approached a provider about procuring connectivity. This procedure for address allocation resulted in a system where the addresses used by customers of the same provider bore little relation to the addresses used by other customers of that provider. In other words, though the topology of the Internet was mostly hierarchical (i.e., customers connected to only one provider and the same path was used to reach all customers of the same provider), the addressing was not, and little aggregation of routes took place. An example of such a topology and addressing scheme shown in Figure 1. draft-ietf-ipngwg-esd-analysis-01.txt [Page 7] INTERNET-DRAFT July 30, 1997 +----------------+ | |------- Customer1 (192.2.2.0) | |------- Customer2 (128.128.0.0) | Provider A |------- Customer3 (18.0.0.0) | |------- Customer4 (193.3.3.0) | |------- Customer5 (194.4.4.0) +----------------+ | | | | +----------------+ | Provider B | +----------------+ Figure 1 Figure 1 shows Provider A having 5 customers, each with their own independently obtained network addresses. Providers A and B connect to each other. In order for Provider B to be able to send traffic to Customers1-5, Provider A must announce each of the 5 networks to Provider B. That is, the routers within Provider B must have explicit routing entries for each of Provider A's customers, 5 separate routes in Figure 1. Experience has shown that this approach scales very poorly. In the Default-Free Zone (DFZ) of the Public Internet, where routers must maintain routing entries for all reachable destinations, the cost of computing forwarding tables quickly becomes unacceptably large. A large part of the cost is related to the seemingly redundant computations that must be made for each individual network, even though the reality is that many reside in the same topological location (e.g., the same provider). Looking at Figure 1, the problem is that provider B performs 5 separate calculations to construct the routing tables needed to reach each of A's customers. 2.3. CIDR and Provider-Based Addressing One of the reasons Classless Inter-Domain Routing (CIDR) and its associated provider-assigned address allocation policy were introduced was to help reduce the size of and cost of computing forwarding tables. CIDR reduces the cost of computing forwarding tables by aggressively aggregating addresses. Aggregating addresses means structuring them in such a way that the location of the nodes draft-ietf-ipngwg-esd-analysis-01.txt [Page 8] INTERNET-DRAFT July 30, 1997 having those addresses can be represented by a single routing entry. In CIDR, this means that addresses share a common prefix. The common prefix provides location information for all addresses sharing that same prefix. In CIDR, sites that want to connect to the Internet approach a provider to procure both connectivity and a network address; individual providers have a large block of address space covered by one prefix and assign pieces of their space to customers. Consequently, customers of the same provider have addresses that share the same prefix. Note that CIDR started the use of the term "prefix" to refer to a Classless network. The combination of CIDR and provider-based addressing results in the ability for a provider to address many hundreds of sites while introducing just *one* network address into the global routing system, i.e., aggregating all of its customers addresses under one prefix. An example of such a topology and addressing scheme is shown in Figure 2. +----------------+ | |------- Customer1 (204.1.0.0/19) | |------- Customer2 (204.1.32.0/23) | Provider A |------- Customer3 (204.1.34.0/24) | |------- Customer4 (204.1.35.0/24) | |------- Customer5 (204.1.36.0/23) +----------------+ | | A announces | 204.1/16 to B | +----------------+ | Provider B | +----------------+ Figure 2 In Figure 2, Provider A has been assigned the classless block, or "aggregate," 204.1.0.0/16 (i.e., a network prefix with 16 bits for the network part and 16 bits for local use). Provider A has 5 customers, each of which has been assigned a prefix subordinate to the aggregate. In order for Provider B to be able to reach Customers1-5, Provider A need only announce a single prefix, 204.1.0.0/16, because that prefix covers all of its customers. The benefit for Provider B is that its routers need only a single routing table entry to reach all of Provider A's customers. Note the difference between the cases described in Figures 1 and 2. The important difference in the two Figures is that the latter example uses fewer slots in the routing table to reach the same number of draft-ietf-ipngwg-esd-analysis-01.txt [Page 9] INTERNET-DRAFT July 30, 1997 destinations. CIDR was a critical step for the Internet: in the early 1990s the size of default-free routing tables required to support the Classful Internet was almost more than the commercially-available hardware and software of the day could handle. The introduction of BGP4's classless routing and provider-based address allocation policies resulted in an immediate relief. Having said that, however, there are some weaknesses of the system. First, the Internet addressing model shifted from one of "address owning" to "address lending." In pre- CIDR days sites acquired addresses from a central authority independent of who their network provider was, and a site could assume it "owned" the address it was given. Owning addresses meant that once one had been given a set of network addresses, one could always use them and assume that no matter where a site connected to the Internet, the prefix for that network could be injected into the public routing system. Today, however, it is simply no longer possible for each individual site to have its own private prefix injected into the DFZ; there would simply be too many of them. Consequently, if a site decides to change providers, then it needs to number itself out of space given to it by the new provider and give its old address back to the old provider. To understand this, consider if, from Figure 2, Customer3 changes its provider from Provider A to Provider C, but does not renumber. The picture would be as follows: +----------------+ | |---- Customer1 (204.1.0.0/19) | |---- Customer2 (204.1.32.0/23) | Provider A | +---------------| |---- Customer4 (204.1.35.0/24) | A announces | |---- Customer5 (204.1.36.0/23) | 204.1/16 to B +----------------+ | | +----------------+ | | Provider B | | +----------------+ | | | | C announces | | 204.1.34/24 | | to B +----------------+ +---------------| Provider C |---- Customer3 (204.1.34.0/24) +----------------+ Figure 3 In Figure 3, each of Provider A, B and C are directly connected to draft-ietf-ipngwg-esd-analysis-01.txt [Page 10] INTERNET-DRAFT July 30, 1997 each other provider. In order for Provider B to reach Customers 1, 2, 4 and 5, Provider A still only announces the 204.1.0.0/16 aggregate. However, in order for Provider B to reach Customer 3, Provider C must announce the prefix 204.1.34.0/24. Prefix 204.1.34.0/24 is called a "more-specific" of 204.1.0.0/16; another term used is that Customer3 and Provider C have "punched a hole in" Provider A's block. The result of this is that from Provider B's view, the address space underneath 204.1.0.0/16 is no longer cleanly aggregated into a single prefix and instead the aggregation has been broken because the addressing is inconsistent with the topology; in order to maintain reachability to Customer3, Provider B must carry two prefixes where it used to have to carry only one. The example in Figure 3 explains why sites must renumber if existing levels of aggregation are to be maintained. While it is certainly clear that one or two "exceptions" to the ideal case can be tolerated, the reality in today's Internet is that there are thousands of providers, many with thousands of individual customers. It is generally accepted that some renumbering of sites is essential for maintaining sufficient aggregation. The empirical cost of renumbering a site in order to maintain aggregation has been the subject of much discussion. The practical reality, however, is that forcing all sites to renumber is difficult given the size and wealth of companies that now depend on the Internet for running their business. Thus, although the technical community came to consensus that address lending was necessary in order for the Internet to continue to operate and grow, the reality has been that some of CIDR's benefits have been lost because sites refuse to renumber. One unfortunate characteristic of CIDR at an architectural level is that the pieces of the infrastructure which benefit from the aggregation (i.e., the providers whose major headache is managing routing table growth in the DFZ) are not the pieces that incur the cost (i.e., the end site). The logical corollary of this statement is that the pieces of the infrastructure which do incur cost to achieve aggregation (e.g., sites which renumber when they change providers) don't directly see the benefit. (The word "directly" is used here because one could claim that the continued operation of the Internet is a benefit, though it is an indirect benefit and requires selflessness on the part of the site in order to recognize it.) 2.4. Multi-Homing and Aggregation As sites become more dependent on the Internet, they have begun to install additional connections to the Internet to improve robustness draft-ietf-ipngwg-esd-analysis-01.txt [Page 11] INTERNET-DRAFT July 30, 1997 and performance. Such sites are called "multi-homed." Unfortunately, when a site connects to the Internet at multiple places, the impact on routing can be much like a site that switches providers but refuses to renumber. In the pre-CIDR days, multi-homed sites were typically known by only one network prefix. When that site's providers announced the site's network into the global routing system, a "shortest path" type of routing would occur so that pieces of the Internet closest to the first provider would use the first provider while other pieces of the Internet might use the second provider. This allowed sites to use the routing system itself to load balance traffic across their multiple connections. This type of multi-homing assumes that a site's prefix can be propagated throughout the DFZ, an assumption that is no longer universally true. With CIDR, issues of addressing and aggregation complicate matters significantly. At the highest levels, there are three possible ways to deal with multi-homed sites. The first approach is for multi- homed sites to receive address space directly from a registry, independent of its providers. The problem with this approach is that, because the address space is obtained independent of either provider, it is not aggregatable and therefore has a negative impact on the scaling of global routing. The second approach is for a multi-homed site to receive an allocation from one of its providers and just use that single prefix. The site would advertise its prefix to all of the providers to which it connects. Their are two problems with this is approach. First, although the prefix is aggregatable by the provider which made the allocation, it is not aggregatable by the other providers. To the other providers, the site's prefix poses the same problem as a provider-independent address would. This has a negative impact on the scaling of global routing. Second, due to CIDR's longest-match routing rules, it turns out that the site's prefix is not always aggregable in practice by the provider that made the allocation. Consider Figure 4. Provider C has two paths for reaching customer 1. Provider A advertises 204.1/16, which includes customer 1. But Provider C will also receive an advertisement for prefix 204.1.0/19 from Provider B, and because the prefix match through B is longer, C will choose that path. In order for Provider C to be able to choose between the two paths, Provider A would also have to advertise the longer prefix for 204.1.0/19 in addition to the shorter 204.1/16. At this point, from the routing perspective, the situation is very similar to the general problem posed by the use of provider- independent addresses. It should be noted that the above example simplifies a very complex draft-ietf-ipngwg-esd-analysis-01.txt [Page 12] INTERNET-DRAFT July 30, 1997 issue. For example, consider the example in Figure 4 again. Provider A could choose *not* to propagate a route entry for the longer 2.4.1.0/19 prefix, advertising only the shorter 204.1/16. In such cases, provider C would always select Provider B. Internally, Provider A would continue to router traffic from its other customers to customer 1 directly. If Provider A had a large enough customer base, effective load sharing would achieved. +------------+ +------------+ _____| Provider A |---| Provider C | / +------------+ +------------+ / 204.1/16 / / / Customer 1 --- / B advertises 204.1.0/19 to C 204.1.0.0/19 | / | +------------+ ----- | Provider B | +------------+ Figure 4 The third approach is for a multi-homed site to receive an allocation from each of its providers. This approach has advantages from the perspective of route scaling because both allocations are aggregatable. Unfortunately, the approach doesn't necessarily meet the demands of the multi-homed site. A site that has a prefix from each of its providers has a number of choices about how to use that address space. Possibilities include: 1) The site can number a distinct set of hosts out of each of the prefixes. Consider a configuration where a site is connected to ISP-A and ISP-B. If the link to ISP-A goes down, then unless the ISP-A prefix is announced to ISP-B (which breaks aggregation), the hosts numbered out of the ISP-A prefix would be unreachable. 2) The site could assign each host multiple addresses (i.e., one address for each ISP connection). There are two problems with this. First, it accelerates the consumption of the address space. Second, when the connection to ISP-A goes down, addresses numbered out of ISP-A's space become unreachable. Remote peers would have to have sufficient intelligence to use the second address. For example, when initiating a connection to a host, the DNS would return multiple candidate addresses. Clients would need to try them all before concluding that a destination is unreachable (something not all hosts currently draft-ietf-ipngwg-esd-analysis-01.txt [Page 13] INTERNET-DRAFT July 30, 1997 do). In addition, a site's hosts would need a significant amount of intelligence for choosing the source addresses they use. A host shouldn't choose a source address corresponding to a addresses that are not reachable from the Public Internet. At present, hosts do not have such sophistication. In summary, how best to achieve multi-homing with IPv4 in the face of CIDR is an unsolved problem. There is a delicate balance between the scalability of routing versus the site's requirements of robustness and load-sharing. At this point in time, no solution has been discovered that satisfies the competing requirements of route scaling and robustness/performance. It is worth noting, however, that some people are beginning to study the issue more closely and propose novel ideas [BATES]. 3. GSE Background This section provides background information about GSE with the intent of making this document stand-alone with respect to the GSE "specification." Additional details on GSE can be found in [GSE]. We begin by reviewing the motivation for GSE. Next we review the salient technical details, and we conclude by listing the explicit non-goals of the GSE proposal. 3.1. Motivation For GSE The primary motivation for GSE is the fact that the chief IPv6 global unicast address structure, provider-based [RFC 2073], is fundamentally the same as IPv4 with CIDR and provider-based aggregation. Provider-based addressing requires that sites renumber when they switch providers, so that sites are always aggregated within their provider's prefix. In practice, the cost of renumbering (which can only grow as a site grows in size and becomes more dependent on the Internet for day-to-day business) is high enough that an increasing number of sites refuse to renumber. This cost is particularly relevant in cases where end-users are asked to renumber because an upstream provider has changed its transit provider (i.e., the end site is asked to renumber for reasons outside of its control and for which it sees no direct benefit). Consequently, The GSE draft asserts that IPv4 with CIDR has not achieved the aggressive aggregation required for the route computation functions of the default-free zone of the Internet to scale for IPv4, and that the larger addresses of IPv6 simply exacerbate the problem. The GSE proposal does not propose to eliminate the need for draft-ietf-ipngwg-esd-analysis-01.txt [Page 14] INTERNET-DRAFT July 30, 1997 renumbering. Indeed, it asserts that end sites will have to be renumbered more frequently in order to continue scaling the Internet. However, GSE proposes to make the cost of such a renumbering so small, that sites could be renumbered at essentially any time with only minor disruption to the site. Finally, GSE deals significantly with sites that have multiple Internet connections. In some addressing schemes (e.g., CIDR), this "multi-homing" can create exceptions to the aggregation and result in poor scaling. That is, the public routing infrastructure needs to carry multiple distinct routes for the multi-homed site, one for each independent path. GSE recognizes the "special work done by the global Internet infrastructure on behalf of multi-homed sites," [GSE] and proposes a way for multi-homed sites to gain some benefit without impacting global scaling. This includes a specific mechanism that providers could use to support multi-homed sites, presumably at a cost that the Site would consider when deciding whether or not to become multi-homed. 3.2. GSE Address Format The key departure of GSE from classical IP addressing (both v4 and v6) is that rather than over-loading addresses with both locator and identifier purposes, it splits the address into two elements: the high-order 8 bytes for routing (called "Routing Stuff" throughout the rest of this document) and the low-order 8 bytes for unique identification of an end-point. The structure of GSE addresses is: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | Routing Goop | STP| End System Designator | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 6+ bytes ~2 bytes 8 bytes Figure 5 3.3. Routing Stuff (RG and STP) The Routing Goop (RG) identifies the place in the Public Internet topology where a Site connects and is used to route datagrams to the Site. RG is structured as follows: draft-ietf-ipngwg-esd-analysis-01.txt [Page 15] INTERNET-DRAFT July 30, 1997 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | xxx | 13 Bits of LSID | Upper 16 bits of Goop | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3 4 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Bottom 18 bits of Routing Goop | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6 The RG describes the location of a Site's connection by identifying smaller and smaller regions of topology until finally it identifies a single link to which the site. Before interpreting the bits in the RG, it is important to understand that routing with GSE depends on decomposing the Internet's topology into a specific graph. At the highest level, the topology is broken into Large Structures (LSs). An LS is basically a region that can aggregate significant amounts of topology. Examples of potential LSs are large providers and exchange points. Within an LS the topology is further divided into another graph of structures, with each LS dividing itself however it sees fit. This division of the topology into smaller and smaller structures can recurse for a number of levels, where the trade-off is "between the flat-routing complexity within a region and minimizing total depth of the substructure." [ESD] Having described the decomposition process, we can now examine the bits in the RG. After the 3-bit prefix identifying the address as GSE, the next 13 bits identify the LS. By limiting the field to 13 bits, a ceiling is defined on the complexity of the top-most routing level. In the next 34 bits, a series of subordinate structure(s) are identified until finally the leaf subordinate structure is identified, at which point the remaining bits identify the individual link within that leaf structure. The remaining 14 bits of the Routing Stuff comprise the STP and are used for routing structure within a Site, similar to subnetting with IPv4, though these bits are *not* part of the Routing Goop. The distinction between Routing Stuff and Routing Goop is that RG controls routing in the Public Internet, while Routing Stuff includes the RG plus the Site Topology Partition (STP). The STP is used for routing structure within a Site. The GSE proposal formalizes the ideas of sites and of public versus private topology. In the first case, a Site is a set of hosts, routers and media which have one or more connections to the Internet. draft-ietf-ipngwg-esd-analysis-01.txt [Page 16] INTERNET-DRAFT July 30, 1997 A Site can have an arbitrarily complicated topology, but all of that complexity is hidden from everyone outside of the Site. A Site only carries packets which originated from, or are destined to, that Site; in other words, a Site cannot be a transit network. A Site is private topology, while the transit networks form the public topology. A datagram is routed through public topology using just the RG, but within the destination Site routing is based on the Site Topology Partition (STP) field. 3.4. End-System Designator The End-System Designator (ESD) is an unstructured 8-byte field that uniquely identifies that interface from all others. The most important feature of the ESD is that it alone identifies an end point; the Routing Stuff portion of an address, although used to help deliver a packet to its destination, is not used to actually identify an end point. End-points of communication care about the ESD; as examples, TCP peers could be identified by the source and destination ESDs alone (together with port numbers), checksums would exclude the RG (the sender doesn't know its RG, so can't include it in the checksum), and on receipt of a datagram only the ESD would be used in testing whether a packet is intended for local delivery. The leading contender for the role of a 64-bit globally unique ESD is the recently defined "EUI-64" identifier [EUI64]. These identifiers consist of a 24-bit "company_id" concatenated with a 40-bit "extension." (Company_id is just a new name for the Organizationally Unique Identifier (OUI) that forms the first half of an 802 MAC address.) Manufacturers are expected to assign locally unique values to the extension field, guaranteeing global uniqueness for the complete 64-bit identifier. A range of the EUI-64 space is reserved to cover pre-existing 48-bit MAC addresses, and a defined mapping insures that an ESD derived from a MAC address will not duplicate the ESD of a device that has a built-in EUI-64. In some cases, interfaces may not have access to an appropriate MAC address or EUI-64 identifier. A globally unique ESD must then be obtained through some alternate mechanism. Several possible mechanisms can be imagined (e.g., the IANA could hand out addresses from the company id assigned it has been allocated), but we do not explore them in detail here. draft-ietf-ipngwg-esd-analysis-01.txt [Page 17] INTERNET-DRAFT July 30, 1997 3.5. Address Rewriting by Border Routers GSE Site border routers rewrite addresses of the packets they forward across the Site/Public Topology boundary. Within a Site, nodes need not know the RG associated with their addresses. They simply use a designated "Site-Local RG" value for internal addresses. When a packet is forwarded to the Public Topology, the border router replaces the Site-Local RG portion of packet's source address with an appropriate value. Likewise, when a packet from the Public Topology is forwarded into a Site, the border router replaces the RG part of the destination address with the designated Site-Local RG. To simplify discussion, the following discussion uses the singular term RG as if a site could have only one RG value (i.e., one connection to the Public Internet). Of course, a site could have multiple Internet connections and consequently multiple RGs. Having border routers rewrite addresses obviates the need to renumber devices within sites because of changing providers --- GSE's approach isn't so much to ease renumbering as to make it transparent to end sites. To achieve transparency, the RG by which a Site is known is hidden (i.e., kept secret) from hosts or routers within that Site. Instead, the RG for the Site would be known only by the exit router, either through static configuration or through a dynamic protocol with an upstream provider. Because end-hosts don't know their RG, they don't know their entire 16-byte public address, so they can't specify the full address in the source fields of packets they originate. Consequently, when a datagram leaves a Site, the egress border router fills in the high- order portion of the source address with the appropriate RG. The point of keeping the RG hidden from nodes within the core of a Site is to insure the changeability of this value without impacting the Site itself. It is expected that the RG will need to change relatively frequently (e.g., several times a year) in order to support scalable aggregation as the topology of the Public Internet changes. A change to a Site's RG would only require a change at the Site's egress point (or points, in the case of a multi-homed Site); and it's well possible that this change would be accomplished through a dynamic protocol with the upstream provider. Hiding a Site's RG from its internal nodes does not, however, mean that changes to RG have no impact on end sites. Since the full 16- byte address of a node isn't a stable value (the RG portion can change), a stored address may contain invalid RG and be unusable if it isn't "refreshed" through some other means. For example, opening a TCP connection, writing the address of the peer to a file and then draft-ietf-ipngwg-esd-analysis-01.txt [Page 18] INTERNET-DRAFT July 30, 1997 later trying to reestablish a connection to that peer is likely to fail. For intra-Site communication, however, it is expected that only the Site-Local RG would be used (and stored) which would continue to work for intra-Site communication regardless of changes to the Site's external RG. This has the benefit of shielding a site's internal traffic from the affects of renumbering changes outside of the site. In addition to rewriting source addresses upon leaving a Site, destination addresses are rewritten upon entering a Site. To understand the motivation behind this, consider a Site with connection to three Internet providers. Because each of those connections has its own RG, each destination within the Site would be known by three different 16-byte addresses. As a result, intra-Site routers would have to carry a routing table three times larger than expected. Instead, GSE proposes replacing the RG in inbound packets with the special "Site-local RG" value to reduce intra-Site routing tables to the minimum necessary. In summary, when a node initiates a flow to a node in another Site, the initiating node knows the full 16-byte address for the destination through some mechanism like a DNS query. The initiating node places the full 16-byte address in the destination address field of the datagram, and that field stays intact through the first Site and through all of the Public Topology. When the datagram reaches the exit border router, the router replaces the RG of the packet's source address. When the datagram arrives at entry router at the destination Site, the router replaces the RG portion of the destination address with the distinguished "Site-Local RG" value. When the destination host needs to send return traffic, that host knows the full 16-byte address for the destination because it appeared in the source address field of the arriving packet. 3.6. Renumbering and Rehoming Mid-Level ISPs One of the most difficult-to-solve components of the renumbering problem is that of renumbering mid-level service providers. Specifically, if SmallISP1 changes its transit provider from BigISP1 to BigISP2 (in the CIDR model), then all of SmallISP1's customers would have to renumber into address space covered by an aggregate of BigISP2 (if the overall size of routing tables is to stay the same). GSE deals with this problem by handling the RG in DNS with indirection. Specifically, a Site's DNS server specifies the RG portion of its addresses by referencing the *name* of its immediate provider, which is a resolvable DNS name (this obviously implies a new Resource Record type). That provider may define some of the low- order bits of the RG and then reference its immediate provider. This draft-ietf-ipngwg-esd-analysis-01.txt [Page 19] INTERNET-DRAFT July 30, 1997 chain of reference allows mid-level service providers to change transit providers, and the customers of that mid-level will simply "inherit" the change in RG. 3.7. Support for Multi-Homed Sites GSE defines a specific mechanism for providers to use to support multi-homed customers that gives those customers more reliability than singly-homed sites, but without a negative impact on the scaling of global routing. This mechanism is not specific to GSE and could be applied to any multi-homing scenario where a site is known by multiple prefixes (including provider-based addressing). Assume the following topology: Provider1 Provider2 +------+ +------+ | | | | | PBR1 | | PBR2 | +----x-+ +-x----+ | | RG1 | | RG2 | | +--x-----------x--+ | SBR1 SBR2 | | | +-----------------+ Site Figure 7 PBR1 is Provider1's border router while PBR2 is Provider2's border router. SBR1 is the Site's border router that connects to Provider1 while SBR2 is the Site's border router that connects to Provider2. Imagine, for example, that the line between Provider1 and the Site goes down. Any already existing flows that use a destination address including RG1 would stop working. In addition, any DNS queries that return addresses including RG1 would not be viable addresses. If PBR1 and PBR2 knew about each other, however, then in this case PBR1 could tunnel packets destined for RG1-prefixed addresses to PBR2, thus keeping the communication working. (Note that true tunneling, i.e., re-encapsulation, is necessary since routers between PBR1 and PBR2 would forward RG1 addresses towards PBR1.) draft-ietf-ipngwg-esd-analysis-01.txt [Page 20] INTERNET-DRAFT July 30, 1997 3.8. Explicit Non-Goals for GSE It is worth noting explicitly that GSE does not attempt to address the following issues: 1) Survival of TCP connections through renumbering events. If a Site is renumbered, TCP connections using a previous address will continue to work only as long as the previous address still works (i.e., while it is still "valid" using RFC 1971 terminology). No attempt is made to have existing connections switch to the new address. 2) It is not known how mobility can be made to work under GSE. 3) It is not known how multicast can be made to work under GSE. 4) The performance impact of having routers rewrite portions of the source and destination address in packet headers requires further study. That GSE doesn't address the above does not mean they cannot be solved. Rather the issues haven't been studied in sufficient depth. 4. Analysis of GSE's Advantages and Disadvantages This section contains the bulk of the GSE analysis and the analysis of the general locator/identifier split. 4.1. End System Designator 4.1.1. Uniqueness Enforcement in the IPv4 Internet As described earlier, in the IPv4 Public Internet, IP addresses contain two pieces of information: a unique identifier and a locator. Embedding location information within an address has the side-effect of helping insure that all addresses are globally unique. If interfaces on two different nodes are assigned the same unicast address, the routing subsystem will (generally) deliver packets to only one of those nodes. The other node will quickly realize that something is wrong (since communication using the duplicate address fails) and take corrective action (e.g., obtain a proper address). This is important for two reasons. It helps detect misconfigurations (use of the wrong address prevents communication from taking place), and helps thwart intruders. draft-ietf-ipngwg-esd-analysis-01.txt [Page 21] INTERNET-DRAFT July 30, 1997 In IPv4, communication usually fails quickly when addresses are not unique. There are two cases to consider, depending on whether the two interfaces assigned duplicate addresses are attached to the same or to different links. When two interfaces on the same link use the same address, a node (host or router) sending traffic to the duplicate address will in practice send all packets to one of the nodes. On Ethernets, for example, the sender will use ARP (or Neighbor Discovery in IPv6) to determine the link layer address corresponding to the destination address. When multiple ARP replies for the target IP address are received, the most recently received response replaces whatever is already in the cache. Consequently, the destinations a node using a duplicate IP address can communicate with depends on what its neighboring nodes have in their ARP caches. In most cases, such communication failures become apparent relatively quickly, since it is unlikely that communication can proceed correctly on both nodes. It is also the case that a number of ARP implementations (e.g., BSD- derived implementations) log warning messages when an ARP request is received from a node using the same address as the machine receiving the ARP request. When two interfaces on different links use the same address, the routing subsystem will generally deliver packets to only one of the nodes because only one of the links has the right "prefix" or "subnet part" corresponding to the IP address. Consequently, the node using the address on the "wrong" link will generally never receive any packets sent to it and will be unable to communicate with anyone. For obvious reasons, this condition is usually detected quickly. An important observation is that, with classical IP, when different nodes mistakenly assign the same IP address to different interfaces, problems become apparent relatively quickly because communication with several (if not all) destinations fails. In contrast, failure scenarios differ when globally unique ESDs are assumed, but two nodes mistakenly select the same one. Embedding location information within an address also provides some, though not much, protection from forged addresses. Although it is trivial to forge a source address in today's Internet, the routing subsystem will in most cases forward any return traffic sent to that address to its proper destination --- not to an arbitrary node masquerading as someone else. To masquerade as someone else requires subverting the routing subsystem, placing the intruder somewhere on the normal routing path between the masqueraded host and its peer, etc. draft-ietf-ipngwg-esd-analysis-01.txt [Page 22] INTERNET-DRAFT July 30, 1997 4.1.2. Overloading Addresses: Network Layer Issues At the network layer, a node compares the destination address of received packets against the addresses of its attached interfaces. Only if the addresses of received packets match are packets handed up to higher layer protocols. In IPv4, the entire address must match. Otherwise, the packet is assumed to be intended for some other node and forwarded on (if received by a router) or silently discarded (if received by a host). This has subtle but significant implications: 1) If a receiving host has multiple interfaces, it has multiple IP addresses. When a packet addressed to a multi-homed host is received on an interface other than the one to which a packet is addressed, the host may reject (i.e., silently discard) the packet, if it implements the "Strong ES Model" defined in [RFC1122]. 2) In recent IPv4 stacks, an interface may have more than one unicast IP address assigned to it. Indeed, one way to renumber an end site is to phase out an address (i.e., "deprecate" it using RFC 1971 terminology) while simultaneously phasing in a new one. Once the deprecated address becomes invalid, packets sent to the invalid address will no longer be accepted by the node, even though the packet may have intuitively reached its intended recipient. Thus, even if a packet sent to an invalid address is somehow delivered to the intended recipient (e.g., via tunneling), the receiver would reject the packet because the address it was sent to no longer belongs to any of the node's interfaces. Consequently, any communication using the invalid address will fail (e.g., new and existing TCP connections). Anyone wishing to communicate with the node must learn and switch to the new address. 3) Because an address also indicates "where" the destination resides within the Internet, a mobile node that moves from one part of the Internet to another must obtain a new address that reflects its new location. Moreover, the routing subsystem will continue to forward packets sent to the mobile node's previous address to the node's previous point of attachment where they are likely be discarded. That is, even if a mobile node is willing to continue accepting packets addressed to one its previous addresses, it is unlikely that they will be received (in the absence of something like Mobile IP [RFC2002]). 4) A multi-homed host has multiple interfaces, each with its own address(es). If one of its interfaces fails, packets could, in theory, be delivered to one of the host's other interfaces. In practice, however, the routing subsystem has no way of knowing draft-ietf-ipngwg-esd-analysis-01.txt [Page 23] INTERNET-DRAFT July 30, 1997 that the interface to which a packet is addressed has failed and what alternate interface addresses the packet could be delivered to. Consequently, packets sent to a failed interface of a multi-homed host won't be delivered, even though the node is reachable through alternate interfaces. Note that the above problems fall into two general categories: 1) Today's routing subsystem is unable to automatically deliver a packet to a host's "alternate" addresses (if the host is multi- homed) or a new address (if the host moves), should there be a problem delivering a packet to the destination address listed in the packet. It is possible to imagine, however, future routing advances addressing this problem (e.g., Mobile IP). 2) Even if a packet is delivered to its intended destination, the packet may still be rejected because the packet's destination address does not match any of the addresses assigned to destination's interfaces. This problem does not appear to be insurmountable and could be rectified (for example) by having a host remember its previous addresses. 4.1.3. Overloading Addresses: Transport Layer Issues The problems discussed previously create particular complications at the transport level. Transport protocols such as TCP and UDP use embedded IP addresses to identify the end-points of a transport connection. Specifically, the communicating end-points of a transport connection are uniquely identified by the sender's source IP address and source port number together with the recipient's destination IP address and port number. Once a connection has been established, the IP addresses can not change. In particular, if a mobile host moves to a new location and obtains a new address, packets intended for a TCP connection created prior to the move cannot use the new address. TCP will treat any packets sent to the new address as belonging to a different TCP connection. It is possible to imagine changes to TCP that might allow connections to change the addresses they are using mid-connection without breaking the connection. However, some subtle issues arise: 1) Packets intended for a pre-existing connection must be demultiplexed to that connection as part of any negotiation to change the addresses that identify that transport end-point. However, because the demultiplexing operation uses the transport addresses of the pre-existing TCP connection (which is based on the previous address), TCP packets sent to a new address won't draft-ietf-ipngwg-esd-analysis-01.txt [Page 24] INTERNET-DRAFT July 30, 1997 be delivered to the desired transport end-point (which still uses the previous address). Consequently, packets would need to be sent to the previous address. However, by the time a mobile node has moved and knows its new address, packets sent to the previous address may no longer be delivered (i.e., they may not be forwarded to the mobile host's new location). 2) When a mobile host moves, it could inform its TCP peers that it has a new address. However, such a message could not be delivered to the remote TCP connection if it was sent using its new address for its source address. Just as above, such packets would not be demultiplexed to the correct TCP connection. On the other hand, it is infeasible to send packets using its previous address from its new location. Because of the danger of spoofing attacks, routers are now encouraged to actively look for, and discard traffic from, a source address that does not match known addresses for that region of the Internet [CERT]. Consequently, such packets cannot be expected to be delivered. Although the previous discussion used mobile nodes as an example, the same problem arises in other contexts. For example, if a site is being renumbered in IPv6, it may have two addresses, a previous (i.e., deprecated) one being phased out and a new (i.e., preferred) one being phased in. At the transport level, the problem of switching addresses is similar in many respects to the mobility problem. 4.1.4. Potential Benefits of Globally Unique ESDs Having a clear separation between the Routing Stuff and the ESD portion of an address gives protocols some additional flexibility. At the network layer, for example, recipients can examine just the ESD portion of the destination addresses when determining whether a packet is intended for them. This means that if a packet is delivered to the correct destination node, the node will accept the packet, regardless of how the packet got there, i.e., without regard to the Routing Stuff of the address, which interface it arrived on, etc. Such packets would then be delivered and accepted by the target host. The idea of using addresses that cleanly separate the Routing Stuff from an ESD is not new [references XXX]. However, there are several different flavors. In its pure form, a sender would only need to know the ESD of an end-point in order to send packets to it. When presented with a datagram to send, network software would be responsible for finding the Routing Stuff associated with the ESD so that the packet can be delivered. A key question is who is responsible for finding the Routing Stuff associated with a given ESD? There are a number of possibilities: draft-ietf-ipngwg-esd-analysis-01.txt [Page 25] INTERNET-DRAFT July 30, 1997 1) The network layer could be responsible for doing the mapping. The advantage of such a system is that an ESD could be stored essentially forever (e.g., in configuration files), but whenever it is actually used, network layer software would automatically perform the mapping to determine the appropriate Routing Stuff for the destination. Likewise, should an existing mapping become invalid, network layer software could dynamically determine the updated quantity. Unfortunately, building such a mapping mechanism that is scalable is a hard problem. 2) The transport layer could be responsible for doing the mapping. It could perform the mapping when a connection is first opened, periodically refreshing the binding for long-running connections. Implementing such a scheme would change the existing transport layer protocols TCP and UDP significantly. 3) Higher-layer software (e.g., the application itself) could be responsible for performing the mapping. This potentially increases the burden on application programmers significantly, especially if long-running connections are required to survive renumbering and/or deal with mobile nodes. It should be noted that the GSE proposal does not embrace the general model. Indeed, it proposes the last. The network layer (and indeed the transport layer) is always presented both the Routing Stuff (RG + STP) and the ESD together in one IPv6 address. It is not the network (or transport) layer's job to determine the Routing Stuff given only the ESD or to validate that the Routing Stuff is correct. When an application has data to send, it queries the DNS to obtain the IPv6 AAAA record for a destination. The returned AAAA record contains both the Routing Stuff and the ESD of the specified destination. While such an approach eliminates the need for the lower layers to be able to map ESDs into corresponding Routing Stuff, it also means that when presented with an address containing an incorrect (i.e., no longer valid) Routing Stuff, the network is unable to deliver the packet to its correct destination. It is up to applications themselves to deal with such failures. Note that addresses containing invalid Routing Stuff will result any time cached addresses are used after the Routing Stuff of the address becomes invalid. This may happen if addresses are stored in configuration files, or with long-running communication. 4.1.5. ESD: Network Layer Issues Along with the flexibility offered by separating the ESD from the Routing Stuff come additional considerations that must be considered at the network layer: draft-ietf-ipngwg-esd-analysis-01.txt [Page 26] INTERNET-DRAFT July 30, 1997 1) Addresses must have a locator embedded within them. It is not feasible to route packets solely on an ESD; doing so would make it impossible to aggregate routing information in a scalable way. The GSE proposal assumes that the locator part of an address is filled with an appropriate value by higher layers (i.e., the transport or application layer). 2) If a receiver observes that recent packets are arriving with a different Routing Stuff in the source address than before, it may want to send return traffic using the new Routing Stuff. However, such information should not be accepted without appropriate authentication of the new Routing Stuff, otherwise it would be trivial to hijack existing transport connections. Always using the most recently received Routing Stuff of an address to send return traffic without appropriate authentication leads to a vulnerability that is equivalent in potential danger to "reversing and using an unauthenticated received source route." Note also that in the GSE proposal, since a sender does not know its own RG, it is not possible for the sender to compute an Authentication Header via IPSec that covers the RG portion of an address. Thus, a recipient of new RG would need to authenticate the received information via some alternate (undefined) mechanism. Finally, receipt of packets from different Routing Stuff than before does not necessarily indicate a permanent change. In the GSE proposal, for example, when a Site is multi-homed, some of its packets may exit via one egress router while other packets exit via a different egress router. Even packets originated from the same source may exit through multiple egress routers. Consequently, a node may receive traffic from the same sender in which the Routing Stuff part changes on every packet. 3) In general, whenever an address is embedded within a packet (including within data), one must consider whether all the bits in the address should be used in computations, or whether just the ESD portion should be used. Examples where such decisions would need to be made include, but are not limited to, Neighbor Discovery packets containing Neighbor Solicitations and Responses [RFC 1970], IPSec packets being demultiplexed to their appropriate Security Association, IP deciding whether to accept an IP datagram (before reaching the transport level), the reassembly of fragments, transport layer demultiplexing of received packets to end-points, etc. draft-ietf-ipngwg-esd-analysis-01.txt [Page 27] INTERNET-DRAFT July 30, 1997 4.1.6. ESD: Transport Layer Issues Previous sections have made clear that the embedding of full IPv6 addresses (i.e., Routing Stuff) within transport connection end-point identifiers poses problems for mobility and site renumbering. This section discusses an alternate approach, in which transport end-point identifiers use ESDs rather than full addresses (with embedded Routing Stuff). In the following discussion, it should be kept in mind that the IPng Recommendation [RFC 1752] states that a transition to IPv6 cannot also require deployment of a "TCPng." In addition, although we focus on TCP, UDP-based protocols also depend on the Routing Stuff in similar ways, e.g., starting with the UDP checksum of the peers' addresses. Indeed, we believe that TCP is the "easy" case to deal with, for two reasons. First, TCP is a stateful protocol in which both ends of the connection can negotiate with each other. Some UDP- based protocols are stateless, and remember nothing from one packet to the next. Consequently, changing UDP-based protocols may require the introduction of "session" features, perhaps as part of a common "library", for use by applications whose transport protocol is relatively stateless. Second, changes to UDP-based protocols in practice mean changing individual applications themselves, raising deployability questions. 4.1.6.1. Demultiplexing Packets to Transport Endpoints Connections in GSE are identified by the ESDs rather than full IPv6 addresses (with embedded Routing Stuff). That is: unique IPv4 TCP connection: srcaddr dstaddr srcport destport unique GSE TCP connection: srcESD dstESD srcport dstport Consequently, with GSE, when demultiplexing incoming packets, TCP would ignore the Routing Stuff portions of addresses when delivering packets to their proper end-point. Although there are potential benefits to this approach (discussed below), demultiplexing on ESDs alone without the RS is, in fact, required with GSE. If a site is multi-homed, the packets it sends may exit different egress border routers during the lifetime of a connection. Because each border router will place its own RG into the source addresses of outgoing packets, the receiving TCP must ignore (at least) the RG portion of addresses when demultiplexing received packets. The alternative would be to make TCP less robust with respect to changes in routing, i.e., if the path changed, packets delivered correctly would be discarded by the receiving TCP rather draft-ietf-ipngwg-esd-analysis-01.txt [Page 28] INTERNET-DRAFT July 30, 1997 than processed. 4.1.6.2. Pseudo-Header Checksum Calculations Having routers rewrite the RG portion of addresses means that TCP cannot include the RG in its checksum calculation; the sender does not know its own RG. Consequently, upon receipt of a TCP segment, the receiver has no way of determining whether the RG portion of an address has been corrupted (or modified) in transit (the implications of this are discussed below). 4.1.6.3. RG Selection When Sending Packets When a host has a packet to send, there are three cases for deciding what RG to use in the destination. If the host is performing an "active open", it queries the DNS to obtain the destination address, which contains appropriate RG. If the host is responding to an active open from a remote peer, the source address of packets from that peer contains usable RG. Note that assuming that the RG on an incoming TCP connection is "correct" needs qualification. It is "correct" in the sense that it corresponds to the site originating the connection. Whether the ESD paired with the RG is actually located at that site cannot be assumed. The issue of spoofing is discussed in more detail later. The last (and most interesting) case is when RG changes mid- connection. Although, the GSE proposal calls for always using the first RG learned (and then never switching), we explored the possibility of doing so in order to better understand the issues. 4.1.6.4. Mid-Connection RG Changes During a connection, the RG appearing on subsequent packets is susceptible to change through renumbering events, and indeed more frequently, to change through Site-internal routing changes that cause the egress point for off-Site traffic to change. It is even possible (in the worst case) that traffic-balancing schemes could result in the use of two egress routers, with roughly every other packet exiting through a different egress router. Consequently it may be desirable to switch to the just-received RG, as the old RG may no longer be valid (e.g., a border router has failed), but care must be taken not to thrash. Moreover, simply using the most-recently- received RG makes it trivial for an intruder to hijack connections. Because TCP under GSE demultiplexes packets using only ESDs, packets will be delivered to the correct end-point regardless of what source RG is used. However, return traffic will continue to be sent via the draft-ietf-ipngwg-esd-analysis-01.txt [Page 29] INTERNET-DRAFT July 30, 1997 "old" RG, even though it may have been deprecated or become less optimal because the peer's border router has changed. It would seem highly desirable for TCP connections to be able to survive such events. However, the completion of renumbering events (so that an earlier RG is now invalid) and certain topology changes would require TCP to switch sending to a new RG mid-connection. To explore the whole space, we considered ways of allowing this mid-connection RG change to happen. If TCP connection identifiers are based on ESDs rather than full addresses, traffic from the same ESD would be viewed as coming from the same peer, regardless of its source RG. This makes it trivial for any Internet host to impersonate another, and have such traffic be accepted by TCP. Because this vulnerability is already present in today's Internet (forging full source addresses is trivial), the mere delivery of incoming datagrams with the same ESD but a different RG does not introduce new vulnerability to TCP. In today's Internet, any node can already originate FINs/RSTs from an arbitrary source address and potentially or definitely disrupt the connection. Therefore, changing RG for acceptance, or acceptance of traffic independent of its source RG, does not appear to significantly worsen existing robustness. We also considered allowing TCP to reply to each segment using the RG of the most recently-received segment. Although this allows TCP to survive some important events (e.g., renumbering), it also makes it trivial to hijack connections, unacceptably weakening robustness compared with today's Internet. A sender simply needs to guess the sequence numbers in use by a given TCP connection [Bellovin 89] and send traffic with a bogus RG to hijack a connection to an intruder at an arbitrary location. Providing protection from hijacking implies that the RG used to send packets must be bound to a connection end-point (e.g., it is part of the connection state). Although it may be reasonable to accept incoming traffic independent of the source RG, the choice of sending RG requires more careful consideration. Indeed, any subsequent change in what RG is used for sending traffic must be properly authenticated using cryptographic means. In the GSE proposal, it is not clear how to authenticate such a change, since the remote peer doesn't even know what RG it is using! Consequently, the only reasonable approach in GSE is to send to the peer at the first RG used by the peer for the entire life of a connection. That is, always continue to use the first RG seen. In summary, changing the RG dynamically in a safe way for a connection requires that an originator of traffic be able to authenticate a proposed change in the RG before sending to a draft-ietf-ipngwg-esd-analysis-01.txt [Page 30] INTERNET-DRAFT July 30, 1997 particular ESD via that RG. Such a mechanism would need to be invented, as the TCP/IP suite has no obvious candidate that operates at or below the transport layer (using the DNS, an application protocol that resides above IP, would be problematic due to layering circularity considerations). 4.1.6.5. Passive Opens One question that arises is what impact corrupted RG would have on robustness. Because the RG is not covered by any checksums, it would be difficult to detect such corruption. Moreover, once a specific RG is in use, it does not change for the duration of a connection. The interesting case occurs on the passive side of a TCP connection, where a server accepts incoming connections from remote clients. If the initial SYN from the client includes corrupted RG, the server TCP will create a TCP connection (in the SYN-RECEIVED state) and cache the corrupted RG with the connection. The second packet of the 3-way handshake, the SYN-ACK packet, would be sent to the wrong RG and consequently not reach the correct destination. Later, when the client retransmits the unacknowledged SYN, the server will continue to send the SYN-ACK using the bad RG. Eventually the client times out, and the attempt to open a TCP connection fails. Figure 8 shows the details. TCP A TCP B 1. CLOSED LISTEN 2. SYN-SENT --> --> SYN-RECEIVED 3. <-- <-- SYN-RECEIVED 4. SYN-SENT --> --> SYN-RECEIVED 5. <-- <-- SYN-RECEIVED ... TCP A times out Figure 8 We next consider relaxing the restriction on switching RGs in an attempt to avoid the previous failure scenario. The situation is complicated by the fact that the RG on received packets may change for legitimate reasons (e.g., a multi-homed site load-shares traffic across multiple border routers). The key question is how can one draft-ietf-ipngwg-esd-analysis-01.txt [Page 31] INTERNET-DRAFT July 30, 1997 determine which RG is valid and which is not. That is, for each of the RGs a sender attempts to use, how can it determine which RG worked and which did not? Solving this problem is more difficult than first appears, since one must cover the cases of delayed segments, lost segments, simultaneous opens, etc. If a SYN-ACK is retransmitted using different RGs, it is not possible to determine which of those RGs worked correctly. We conclude that the only way TCP could determine that a particular RG was used to deliver segments was if it received an ACK for a specific sequence number in which all transmissions of that sequence number used the same RG (a non-trivial addition to TCP). We analyze multiple cases of RG changing within the time of the opening handshake. One example is diagrammed in Figure 9, and it and two others are summarized in Table 1. We observe that RG flap and large numbers of passive opens may coincide, for instance, when a power failure at a server farm affects both internal routers and servers. time TCP A time TCP B t0 --> t1 t3 <-- t1 TCP B's SYN,ACK is delayed and crosses with retransmit of TCP A's SYN on which RG has changed from M to N t2 --> t3 t4 --> t3 ESTABLISHED TCP B decides to use DST RG=M for TCP A, because it heard from RG=M and was ACK'd on a send to RG=M Figure 9 draft-ietf-ipngwg-esd-analysis-01.txt [Page 32] INTERNET-DRAFT July 30, 1997 SYNFROM SYNACKTO ACKFROM SELECT W W X W ------------------------------------ W X W X W ------------------------------------ W W X X Y ?? Table 1 At best, an RG selection algorithm for TCP would be relatively straightforward but would require new logic in implementations of TCP's opening handshake --- a significant transition issue. We are not certain that a valid algorithm is attainable, however. RG changes would have to be handled in all cases handled by the opening handshake: delayed segments, lost segments, undetected bit errors in RG, simultaneous opens, old segments and so on. In the end, we conclude that although the corrupted SYN case of Figure 8 was a potential problem, the changes that would need to be made to TCP to robustly deal with such corruption would be significant, if tractable at all. This would result in transition to GSE needing a significant TCPng transition. Our final conclusion is that transport protocol end-points must make an early, single choice of the RG to use when sending to a peer and stick with that choice for the duration of the connection. Specifically: 1) The demultiplexing of arriving packets to their transport end points should use only the ESD, and not the Routing Stuff. 2) If the application chooses an RG for the remote peer (i.e., an active open), use the provided RG for all traffic sent to that peer, even if alternative RGs are received on subsequent incoming datagrams from the same ESD. 3) For all other cases, use the first RG received with a given ESD for all sending. We recommend that a means be found for RGs to be checksummed if the GSE address structure is used. Consequently, there does not appear to be a straightforward way to use ESDs in conjunction with mobility or site renumbering (in which existing connections survive the renumbering). draft-ietf-ipngwg-esd-analysis-01.txt [Page 33] INTERNET-DRAFT July 30, 1997 4.1.6.6. Summary: ESD and RG Not Strictly Independent We cannot emphasize enough that the use of an ESD independent of an associated RG can be very dangerous. That is, communicating with a peer implies that one is always talking to the same peer for the duration of the communication. But as has been described in previous sections, such assurance can only take place if there are assurances that only properly authenticated RG is used. We conclude that the rules for transport processing when ESDs are present differ from classical IP. Specifically: 1) The demultiplexing of packets to transport connection end-points should use ESDs, but should not use the Routing Stuff part of addresses. This insures that packets are delivered to their intended destination independent of RG. 2) Once a packet has been delivered to its transport end-point, a separate (i.e., distinct) decision should be made concerning whether and how to act upon the received packet. Such a decision would be transport-protocol specific. A protocol could chose to completely ignore the packet, it could selectively use parts of the packet (e.g., to attempt out-of-band authentication of the RG), or it could process the packet in its entirety. It must not, however, use the received RG to send subsequent return traffic without first authenticating the RG. 4.1.7. On The Uniqueness Of ESDs The uniqueness requirements for ESDs depends on what purpose they serve. In GSE, ESDs identify end systems, requiring that they be globally unique. It does not make sense for two different end systems to use the same ESD; every end system must have its own ESD to distinguish from other end systems. If ESDs are only used to identify session endpoints, the situation becomes more complex. At first glance it might appear that two nodes using the same ESD cannot communicate. However, this is not necessarily the case. In the GSE proposal, for example, a node queries the DNS to obtain an IPv6 address. The returned address includes the Routing Stuff of an address (the RG+STP portions). Since the sending host transmits packets based on the entire destination IPv6 address, the sender may well forward the packet to a router that delivers the packet to its correct destination (using the information in the Routing Stuff). It is only on receipt of a packet that a node would extract the ESD portion of a datagram's destination address and draft-ietf-ipngwg-esd-analysis-01.txt [Page 34] INTERNET-DRAFT July 30, 1997 ask "is this for me?" A more problematic case occurs if two nodes using the same ESD communicate with a third party. To the third party, packets received from either machine might appear to be coming from the same machine since they are both using the same ESD. Consequently, at the transport level, if both machines choose the same source and destination port numbers (one of the ports --- a server's well-known port number will likely be the same), packets belonging to two distinct transport connections will be demultiplexed to a single transport end-point. When packets from different sources using the same source ESD are delivered to the same transport end-point, a number of possibilities come to mind: 1) The transport end-point could accept the packet, without regard to the Routing Stuff of the source address. This may lead to a number of robustness problems, if data from two different sources mistakenly using the same ESD are delivered to the same transport or application end-point (which at best will confuse the application). 2) The transport end-point could verify that the Routing Stuff of the source address matches one of a set of expected values before processing the packet further. If the Routing Stuff doesn't match any expected value, the packet could be dropped. This would result in a connection from one host operating correctly, while a connection from another host (using the same ESD) would fail. 3) When a packet is received with an unexpected Routing Stuff the receiver could invoke special-purpose code to deal with this case. Possible actions include attempting to verify whether the Routing Stuff is indeed correct (the saved values may have expired) or attempting to verify whether duplicate ESDs are in use (e.g., by inventing a protocol that sends packets using both Routing Stuff and verifies that they are delivered to the same end-point). 4.1.8. DNS PTR Queries IPv4 uses the domain "IN-ADDR.ARPA" to hold PTR Resource Records. PTR RRs allow a client to map IP addresses back into the domain name corresponding to that address. IPv4 addresses can be put into the DNS because they have hierarchical structure -- the same hierarchy used to aggregate routes. draft-ietf-ipngwg-esd-analysis-01.txt [Page 35] INTERNET-DRAFT July 30, 1997 The ability to map an IP address into its corresponding DNS name is used in several contexts: 1) Network packet tracing utilities (e.g., tcpdump) display the contents of packets. Printing out the DNS names appearing in those packets (rather than dotted IP addresses) requires access to an address-to-name mapping mechanism. 2) Some applications perform "cheap" authentication by using the DNS to map a source address of a peer into a DNS name. Then, the client queries the DNS a second time, this time asking for the address(es) corresponding to the peer's DNS name. Only if one of the addresses returned by the DNS matches the peer address of the TCP connection is the source of the TCP connection accepted as being from the indicated DNS name. It is important to note that although two DNS queries are made during the above operation, it is the second one --- mapping the peer's DNS name back into an IP address --- that provides the authentication property. The first transaction simply obtains the peer's DNS name, but no assumption is made that the returned DNS name is correct. Thus, the first DNS query could be replaced by an alternate mechanism without weakening the already weak authentication check described above. One possible alternate mechanism, an ICMP "Who Are You" message, is described in Section 4.1.11. 3) Applications that log all incoming network connections (e.g., anonymous FTP servers) may prefer logging recognizable DNS names to addresses. 4) Network administrators examining logs or other trace data containing addresses may wish to determine the DNS name of some addresses. Note that this may occur sometime after those addresses were actually used. Although DNS PTR records have proven useful in several contexts, there is also widespread agreement that, in practice, many IP addresses in use today are not properly registered in the IN- ADDR.ARPA namespace. Consequently, PTR queries frequently fail to return usable information. Thus, the overall utility of PTR records is questionable. It is also worth noting that the primary reason that so few addresses are properly registered in the PTR space is the absence of incentive for doing so. With no key piece of the Internet infrastructure depending on such mappings being in place or correct, there is little practical harm in failing to keep it up-to-date. draft-ietf-ipngwg-esd-analysis-01.txt [Page 36] INTERNET-DRAFT July 30, 1997 Finally, it might appear at first glance that secure DNS [RFC2065] provides a means for cryptographically signing a PTR record and thereby providing authentication. Things are not so simple, however. The signature on a PTR record indicates that the entity owning an address has given it a DNS name. It does not mean that the owner of the address is authorized to use that specific name. For example, anyone owning an address can set up a PTR record indicating that the address corresponds to the name "www.ietf.org". However, the name "www.ietf.org" belongs to only one entity, regardless of how many PTR records indicate otherwise. 4.1.9. Reverse Mapping of ESDs It is reasonable to ask if it is necessary or desirable to be able to map an ESD (alone) into some other meaningful quantity, such as a fully qualified domain name. The benefits of being able to perform such a mapping are analogous to those described in the preceding section. The primary difficulty with constructing such a mapping is that it requires that ESDs have sufficient structure to support the delegating mechanism of a distributed database such as DNS. The sorts of built-in identifiers now found in computing hardware, such as "EUI-48" and "EUI-64" addresses [IEEE802, IEEE1212], do not have the structure required for this delegation. Hence, stateless autoconfiguration [RFC1971] cannot create addresses with the necessary hierarchical property. Another possibility would be to define ESDs with sufficient structure to permit the construction of a mapping mechanism. However, analysis performed during the IPng deliberations concluded that close to 48- bits of hierarchy were needed to identify all the possible sites 30-40 years from now. That would leave only 2 bytes for host numbering at a site, a number clearly incompatible with stateless autoconfiguration [RFC1971]. There are several arguments against having a global ESD-lookup capability. Adding sufficient structure to an 8-byte ESD would be incompatible with stateless autoconfiguration, which already uses 6 bytes for its token; two additional bytes for hierarchy are clearly insufficient. In addition, experience with the IN-ADDR.ARPA domain suggests that the required databases will be poorly maintained. Finally, imposing a required hierarchical structure on ESDs would also introduce a new administrative burden and a new or expanded registry system to manage ESD space. While the procedures for assigning ESDs, which need only organizational and not topological significance, would be simpler than the procedures for managing IPv4 draft-ietf-ipngwg-esd-analysis-01.txt [Page 37] INTERNET-DRAFT July 30, 1997 addresses (or DNS names), it is hard to imagine such a process being universally well-received or without controversy; it seems a laudable goal to avoid the problem altogether if possible. 4.1.10. Reverse Mapping of Complete GSE Addresses Although it seems infeasible to have a global scale, reverse mapping of ESDs, within a Site, one could imagine maintaining a database keyed on unstructured 8-byte ESDs. However, it is a matter of debate whether such a database can be kept up-to-date at reasonable cost, without making unreasonable assumptions as to how large sites are going to grow, and how frequently ESD registrations will be made or updated. Note that the issue isn't just the physical database itself, but the operational issues involved in keeping it up-to-date. For the rest of this section, however, let us assume such a database can be built. A mechanism supporting a lookup keyed on a flat-space ESD from an arbitrary Site requires having sufficient structure to identify the Site that needs to be queried. In practice, an ESD will almost always be used in conjunction with Routing Stuff (i.e., a full 16-byte address). Since the Routing Stuff is organized hierarchically, it becomes feasible to maintain a DNS tree that maps full GSE addresses into DNS names, in a fashion analogous to what is done with IPv4 PTR records today. It should be noted that a GSE address lookup will work only if the Routing Stuff portion of the address is correctly entered in the DNS tree. Because the RG portion of an address is expected to change over time, this assumption will not be valid indefinitely. As a consequence, a packet trace recorded in the past might not contain enough information to identify the off-Site sources of the packets in the present. This problem can be addressed by requiring that the database of RG delegations be maintained for some period of time after the RG is no longer usable for routing packets. Finally, it should be noted that the problem where an address's RG "expires" with the implication that the mapping of "expired" addresses into DNS names may no longer hold is not a problem specific to the GSE proposal. With provider-based addressing, the same issue arises when a site renumbers into a new provider prefix and releases the allocation from a previous block. The authors are aware of one such renumbering in IPv4 where a block of returned addresses was reassigned and reused within 24 hours of the renumbering. draft-ietf-ipngwg-esd-analysis-01.txt [Page 38] INTERNET-DRAFT July 30, 1997 4.1.11. The ICMP "Who Are You" Message Although there is widespread agreement on the utility of being able to determine the DNS name one is communicating with, there is also widespread concern that repeating the experience of the "IN- ADDR.ARPA" domain is undesirable. Consequently, an old proposal to define an ICMP "Who Are You?" message was resurrected [RFC1788]. A client would send such a message to a peer, and that peer would return an ICMP message containing its DNS name. Asking a remote host to supply its own name in no way implies that the returned information is accurate. However, having a remote peer provide a piece of information that a client can use as input to a separate authentication procedure provides a starting point for performing strong authentication. The actual strength of the authentication depends on the authentication procedure invoked, rather than the untrustable piece of information provided by a remote peer. Reconsidering the "cheap" authentication procedure described in Section 4.1.9, the ICMP "Who Are You" replaces the DNS PTR query used to obtain the DNS name of a remote peer. The second DNS query, to map the DNS name back into a set of addresses, would be performed as before. Because the latter DNS query provides the strength of the authentication, the use of an ICMP "Who Are You" message does not in any way weaken the strength of the authentication method. Indeed, it can only make it more useful in practice, because virtually all hosts can be expected to implement the "Who Are You" message. The "Who Are You" message could contain an identifier for matching replies to requests, and perhaps a nonce value to provide resistance to spoofing. In order to minimize the number of WRU packets on the Internet, the WRU messages should be sent by DNS servers who would then cache the answers. This has the pleasant side-effect of reducing the impact on existing applications (i.e., they would continue to look up addresses using the same API as before). In many cases there is a natural TTL that the target node can provide in its reply: either the remaining lifetime of a DHCP lease or the remaining valid time of a prefix from which the address was derived through stateless autoconfiguration. The "Who Are You?" (WRU) message described in Section 4.1.10 is robust against renumbering, since it follows the paths of valid routable prefixes. Essentially, it uses the Internet routing system in place of the DNS delegation scheme. It is attractive in the context of GSE-style renumbering, since no host or DNS server needs to be updated after a renumbering event for WRU-based lookups to work. It has advantages outside the context of GSE as well, including draft-ietf-ipngwg-esd-analysis-01.txt [Page 39] INTERNET-DRAFT July 30, 1997 a more decentralized, and hence more scalable, administration and easier upkeep than a DNS reverse-lookup zone. It also has drawbacks: it requires the target node to be up and reachable at the time of the query and to know its fully qualified domain name. It is also not possible to resolve addresses once those addresses become unroutable. In contrast, the DNS PTR mirrors, but is independent of, the routing hierarchy. The DNS can maintain mappings long after the routing subsystem stops delivering packets to certain addresses. The requirement that the target node be up and reachable at the time of the query makes it very uncertain that one would be able to take addresses from a packet log and translate them to correct domain names at a later date. This is a design flaw in the logging system, as it violates the architectural principle, "Avoid any design that requires addresses to be ... stored on non-volatile storage." [RFC1958] A better-designed system would look up domain names promptly from logged addresses. Indeed, one of the authors is pleased to be able to state that his site has been doing that for some years. (Speculative note: Proxy servers to answer WRU queries are possible. If the boundary between the global and site portions of addresses are fixed and/or the boundary between the routing and the end-node portions are fixed, then one could define a well-known anycast address for proxy WRU service per site and/or per subnet. The low- order portion of this address would presumably be created from the IANA's IEEE OUI. The WRU client-side interface would have to be defined to try this address after or before sending a query to the target address itself. Nodes answering to this anycast address could reply to WRU queries using a database maintained by private means. By carrying a /128 route site-wide or in the site's provider, these servers need not even be located within the subnet or site they serve. Co-location of the proxy WRU servers with some DNS servers is a natural choice in some scenarios.) 4.2. Renumbering and Domain Name System (DNS) Issues 4.2.1. How Frequently Can We Renumber? One premise of the GSE proposal [GSE] is that an ISP can renumber the Routing Goop portion of a Site's addresses transparently to the Site (i.e., without coordinating the change with the Site). This would make it possible for backbone providers to aggressively renumber the Routing Goop part of addresses and achieve a high degree of route aggregation. On closer examination, frequent (e.g., daily) renumbering turns out to be difficult in practice because of a circular dependency between the DNS and routing. Specifically, if a draft-ietf-ipngwg-esd-analysis-01.txt [Page 40] INTERNET-DRAFT July 30, 1997 Site's Routing Stuff changes, nodes communicating with the Site need to obtain the new Routing Stuff. In the GSE proposal, one queries the DNS to obtain this information. However, in order to reach a Site's DNS servers, the pointers controlling the downward delegation of authoritative DNS servers (i.e., DNS "glue records") must use addresses (with Routing Stuff) that are reachable. That is, in order to find the address for the web server "www.foo.bar.com", DNS queries might need to be sent to a root DNS servers, as well as DNS servers for "bar.com" and "foo.bar.com". Each of these servers must be reachable from the querying client. Consequently, there must be an overlap period during which both the old Routing Stuff and the new Routing Stuff can be used simultaneously. During the overlap period, DNS glue records would need to be updated to use the new addresses (including Routing Stuff). Only after all relevant DNS servers have been updated and older cached RRs containing the old addresses have timed out can the old address be deleted. An important observation is that the above issue is not specific to GSE: the same requirement exists with today's provider-based addressing architecture. When a site is renumbered (e.g., it switches ISPs and obtains a new set of addresses from its new provider), the DNS must be updated in a similar fashion. 4.2.2. Efficient DNS support for Site Renumbering When a site renumbers to satisfy its ISP, only the site's routing prefix needs to change. That is, the prefix reflects where within the Internet the site resides. Although some sites may also change the numbering of their internal topology when switching providers, this is not a requirement. Rather, it may be a convenient time to also perform any desired internal renumbering since in practice that any address renumbering tends to cause disruptions. In the current Internet, when a site is renumbered, the addresses of all the site's internal nodes change. This requires a potentially large update to the RR database for that site. Although Dynamic DNS [DDNS] could potentially be used, the cost is likely to be large due to the large number of individual records that would need to be updated. In addition, when DHCP and DDNS are used together [DHCP- DDNS], it may be the case that individual hosts "own" their own A or AAAA records, further complicating the question of who is able to update the contents of DNS RRs. One change that could reduce the cost of updating the DNS when a site is renumbered is to split addresses into two distinct portions: a Routing Goop that reflects where a node attaches to the Internet and a "site internal part" that is the site-specific part of an address. draft-ietf-ipngwg-esd-analysis-01.txt [Page 41] INTERNET-DRAFT July 30, 1997 During a renumbering, only the Routing Goop would change; the "site internal part" would remain fixed. Furthermore, the two parts of the address could be stored in the DNS as separate RRs. That way, renumbering a site would only require that the Routing Goop RR of a site be updated; the "site-internal part" of individual addresses would not change. To obtain the address of a node from the DNS, a DNS query for the name would return two quantities: the "site internal part" and the DNS name of the Routing Stuff for the site. An additional DNS query would then obtain the specific RR of the site, and the complete address would be synthesized by concatenating the two pieces of information. Implementing these DNS changes increases the practicality of using Dynamic DNS to update a site's DNS records as it is renumbered. Only the site's Routing Goop RRs would need updating. Finally, it may be useful to divide a node's AAAA RR into the three logical parts of the GSE proposal, namely RG, STP and ESD. Whether or not it is useful to have separate RRs for the STP and ESD portions of an address or a single RR combining both is an issue that requires further study. If AAAA records are comprised of multiple distinct RRs, then one question is who should be responsible for synthesizing the AAAA from its components: the resolver running on the querying client's machine or the queried name server? To minimize the impact on client hosts and make it easier to deploy future changes, it is recommended that the synthesis of AAAA records from its constituent parts be done on name servers rather than in client resolvers. 4.2.3. Two-Faced DNS The GSE proposal attempts to hide the RG part of addresses from nodes within a Site. If the nodes do not know their own RG, then they can't store or use them in ways that cause problems should the Site be renumbered and its RG change (i.e., the cached RG become invalid). A Site's DNS servers, however, will need to have more information about the RG its Site uses. Moreover, the responses it returns will depend on who queries the server. A query from a node within the Site should return an address with an RG portion equal to "Site local," whereas a query for the same name from a client located at a different Site would return the appropriate RG portion. This facilitates intra-site communication to be more resilient to failures outside of the site. Such context-dependent DNS servers are commonly referred as "two- draft-ietf-ipngwg-esd-analysis-01.txt [Page 42] INTERNET-DRAFT July 30, 1997 faced" DNS servers. Some issues that must be considered in this context: 1) A DNS server may recursively attempt to resolve a query on behalf of a requesting client. Consequently, a DNS query might be received from a proxy rather than from the client that actually seeks the information. Because the proxy may not be located at the same Site as the originating client, a DNS server cannot reliably determine whether a DNS request is coming from the same Site or a remote Site. One solution would be to disallow recursive queries for off-Site requesters, though this raises additional questions. 2) Since cached responses are, in general, context sensitive, a name server may be unable to correctly answer a query from its cache, since the information it has is incomplete. That is, it may have loaded the information via a query from a local client, and the information has a Site-local prefix. If a subsequent request comes in from an off-Site requester, the DNS server cannot return a correct response (i.e., one containing the correct RG). 4.2.4. Bootstrapping Issues If Routing Stuff information is distributed via the DNS, key DNS servers must always be reachable. In particular, the addresses (including Routing Stuff) of all root DNS servers are, for all practical purposes, well-known and assumed to never change. It is not uncommon for the addresses of root servers to be hard-coded into software distributions. Consequently, the Routing Stuff associated with such addresses must always be usable for reaching root servers. If it becomes necessary or desirable to change the Routing Stuff of an address at which a root DNS server resides, the routing subsystem will likely need to continue carrying "exceptions" for those addresses. Because the total number of root DNS servers is relatively small, the routing subsystem is expected to be able to handle this requirement. All other DNS server addresses can be changed, since their addresses are typically learned from an upper-level DNS server that has delegated a part of the name space to them. So long as the delegating server is configured with the new address, the addresses of other servers can change. draft-ietf-ipngwg-esd-analysis-01.txt [Page 43] INTERNET-DRAFT July 30, 1997 4.2.5. Renumbering and Reverse DNS Lookups It is certain that many sites will, from time to time, undergo a renumbering event, either through the mechanisms proposed for GSE or using the facilities already specified for IPv6. It would be useful to an outside node corresponding with such a site to be able to distinguish a legitimate renumbering from an attempt to impersonate the site. We claim that the DNS IP6.INT zone, without security extensions [RFC2065], is of no use in making this determination and that even a completely secured IP6.INT zone is of little use compared with the "forward" DNS zone. The first half of the claim is almost self-evident. An impersonator can set up an insecure zone at some point in the IP6.INT hierarchy and load it with any desired data. This is the reason that current applications doing minimal access control follow a reverse lookup with a forward lookup. With a secured reverse zone, the problem of verifying an apparent renumbering of a site can still be quite complex in the general case, and will certainly be outside the scope of a transport protocol, if survival of long-running sessions is contemplated. Under provider- based addressing [RFC2073], renumbering is expected to occur due to a change in network topology (e.g., a change in a provider relationship at some point in the address aggregation tree). This alters the global prefixes in use below the point of the change, and correspondingly alters the chain of delegations of the DNS reverse- mapping tree. And, although operational experience with secure DNS is quite limited, it seems likely that there would also be a change in the chain of certifications of the signing key of the leaf zone representing the site. It is then problematic to translate established trust in the old reverse mapping zone into trust in the new zone. Certainly it's simpler to rely on the forward zone only. The only function of the reverse zone, then, is to suggest an entry point to the forward zone's database. It is this function which we propose to achieve by means of a new ICMP message exchange. 4.3. Address Rewriting Routers One of the most novel pieces of GSE is the rewriting of addresses as datagrams enter and leave sites. If only a small number of routers know the RG portion of the addresses, then the operational impact of renumbering a Site would be small. In fact, assuming that the critical security issues are dealt with, one could imagine a dynamic protocol that a Site uses with its upstream provider to be told what RG to use, so it might even be possible to renumber a Site transparently. draft-ietf-ipngwg-esd-analysis-01.txt [Page 44] INTERNET-DRAFT July 30, 1997 GSE's ability to insure that the RG portion of a Site's addresses reflect the actual location of that Site within the Public Internet means that very aggressive aggregation (i.e., better route scaling) can be achieved. Both GSE and other route-scaling approaches that use provider-based addressing depend on aggressive aggregation, but while other schemes rely largely on operational policies, GSE attempts to include mechanisms in its core to insure that aggressive aggregation happens in practice. GSE has an advantage over other provider-based addressing schemes like IPv4's CIDR with respect to the "fair distribution of work." CIDR addresses the scaling of routing in DFZ portions of the Internet, but the cost of carrying out the renumbering to maintain the aggregation falls on the shoulders of subscribers who are far away from the DFZ; in other words, subscribers must do the work of renumbering so that their provider (or possibly even their provider's provider) sees better aggregation. With GSE, the majority of the cost required to make the routing scale would be incurred by the parties who reap the benefits. 4.3.1. Load Balancing While not considered a major advantage, with GSE, multi-homed sites can more easily achieve symmetry with respect to which of their links is used for a given flow. With GSE, if HostA in multi-homed Site1 initiates a flow to HostB in Site2, then when the initial packet leaves Site1 the source address will be rewritten with an RG that identifies the egress link used. As a result, when HostB needs to send return traffic, it will use the full 16-byte address from the arriving packet and this necessarily means that traffic for this flow coming into Site1 will use the same circuit that outgoing traffic for that flow took. In contrast, if the source address (i.e., Routing Stuff) is fixed by the sending host, the same return path is used for return traffic coming back to a site, regardless of which egress router packets traverse when leaving that site. 4.3.2. End-To-End Argument: Don't Hide RG from Hosts Despite these significant advantages, however, the overwhelming consensus was that address rewriting by routers should not be pursued as part of the current standardization effort. Although hiding RG knowledge from hosts has advantages in some scenarios, that lack of knowledge also makes it difficult to solve important problems. For example, a host in a multi-homed site is known by multiple addresses, but without knowing its address the host can play no role draft-ietf-ipngwg-esd-analysis-01.txt [Page 45] INTERNET-DRAFT July 30, 1997 in the source address selection; instead, the host relies on the routing infrastructure to magically select the right one, i.e., by selecting the egress router closest to the sender. For many sites, this is the desired behavior. For others, this is not the desired behavior. In those cases, the historically difficult-to-solve problem of source address selection is made more difficult by moving it from an intra-host decision to a distributed one. Now a site's internal routers would have to have sufficient knowledge to decide which egress router to forward traffic to, perhaps on a source-by-source (or worse) basis. Another end-to-end problem resulting from address rewriting has to do with how transport connections should deal with the RG portion of the address in incoming packets, particularly when authenticating the RG changes. The sections on transport issues deal with the subject in much more detail. Interesting questions arise about address rewriting when dealing with tunnels. Any node that acts as a tunnel for which the other end resides in a different Site must be able to behave as a Site border router and do address rewriting. This means that the RG may need to be configured in more than just a Site's egress router, thus making renumbering more problematic. Another problem related to both performance and "architectural cleanliness" has to do with IPv6's Routing Headers. It may be necessary for addresses other than just the simple source and destination to be rewritten. And again, this rewriting would need to be done by both egress routers and nodes which terminate tunnels that go to other sites. 4.4. Multi-Homing Multi-Homing can mean many things. In the context of GSE, multi- homing refers to a Site having more than one connection to the Internet and therefore being known by multiple RGs. In many ways this is close to multi-homing with IPv6 provider-based addressing. It is hard to make comparisons to IPv4 because multi-homing has traditionally been done in an ad hoc fashion. With GSE, the ability of a Site to control the load-sharing over its multiple links is not clear, partially because there is little operational experience with multi-homed sites known by multiple prefixes (with IPv4 the site is generally only known by a single prefix). The following analysis is relevant to any scheme where an Internet-connected site is known by multiple prefixes. For flows that the multi-homed site initiates, load-sharing is impacted by the draft-ietf-ipngwg-esd-analysis-01.txt [Page 46] INTERNET-DRAFT July 30, 1997 source address used because that is the address that the remote site will use for return traffic. If we assume the model of routers rewriting source addresses, then the outgoing link selected determines the load-sharing because that also determines what RG is contained in the source address. If the routers do not rewrite source addresses, then the end-host itself will have to make the source address selection, and the optimal choice may require knowledge of the topology. For flows initiated by someone outside of the multi- homed site, the load-sharing is dependent on the destination address specified, so the DNS has a large impact on load-sharing. There is some amount of operational experience in using DNS to control load on servers (e.g., having a Web server resolve to multiple addresses), though that is load-sharing of a different resource and at a different scope and scale. It is also worth noting that the selection of the optimal outgoing link may well depend on the destination, which has particularly interesting results on the DNS understanding topology (and brings up the question of whether the DNS servers or the resolvers are responsible for knowing the topology). One advantage that GSE has for multi-homed sites is symmetry. Because the source address is selected based on the outgoing link, and that source address is what determines the return path, flows initiated by the Site will be symmetric with respect to which of the Site's links is used. The multi-homing mechanism described in Section 3.7 has some weaknesses and complexities. First, the mechanism only supports healing a failed link and not a router; in other words, referencing Figure 7, from Section 3.7, if PBR1 were not up at all, then it could not tunnel the packets anywhere. One could imagine ways of distributing PBR1's knowledge of PBR2 to other routers within Provider1 to add more reliability, though this makes the problem distributed rather than point-to-point and therefore more difficult. Second, in the general case, static identification of PBR2 to PBR1, and vice-versa, is not adequate. Imagine, for example, that the link to PBR1 is much faster than the link to PBR2. In this case, it's possible that packets whose destination addresses contain RG1 might normally transit PBR2 without going directly to the Site. So there seems to be a need for a dynamic protocol between PBR1 and PBR2 to notify when PBR2, for example, should forward RG1-prefaced destinations directly to the Site as opposed to forwarding it towards PBR1. Another note about multi-homing is the potential impact of internal topology changes in the face of address rewriting. Using the previously referenced diagram, if a flow from a host within the Site is leaving via SBR1, but then something happens such that SBR2 becomes the host's closest exit point, then the remote end-point of draft-ietf-ipngwg-esd-analysis-01.txt [Page 47] INTERNET-DRAFT July 30, 1997 the flow will begin seeing different RG. Reasons such as this are why the repercussions on the transport layer are so important (e.g., whether or not transport peers pay attention to the RG). 5. Results This section summarizes the results of the GSE deliberations on the IPv6 process. 1) Make changes to the IPv6 provider-based addressing document to facilitate aggressive aggregation that is also operationally realistic. 2) Create hard boundaries in IPv6 addresses to clearly distinguish between the portions used to identify hosts, for routing within a site, and for routing within the Public Internet. 3) Allow an option for the low-order 8 bytes of IPv6 addresses to be designated as a globally unique End System Designator (ESD). This change has potential benefits to future transport protocols (e.g., TCPng). 4) Make a clear distinction between the "locator" part of an address and the "identifier" part of the address. The former is used to route a packet to its end-point, the latter is used to identify an end-point, independent of the path used to deliver the packet. Although this is a potentially revolutionary change to IPv6 addressing model, existing transport protocols such as TCP and UDP will not take advantage of the split. Future transport protocols (e.g., TCPng), however, may. 5) Make changes to the way AAAA records are stored within the DNS, so that renumbering a site (e.g., when a site changes ISPs) requires few changes to the DNS database in order to effectively change all of a site's address AAAA RRs. 6) Don't hide a node's full address from that node. In a scheme where all nodes know their full address, address rewriting should not be necessary. 7) Consider multi-homing and its effect on aggregation and route scaling from the beginning. Have a goal of architecting a way to do multi-homing that is both scalable and operationally practical, and consider related issues such as load-sharing. 8) Consider the issue of subnetting. For example, how are point- to-point links numbered? With IPv4, current practice is to draft-ietf-ipngwg-esd-analysis-01.txt [Page 48] INTERNET-DRAFT July 30, 1997 number point-to-point links out of "/30" subnets. However, do network masks longer than 64 bits make sense with the concept of the low-order 8 bytes being a globally unique ESD? If not, then is it acceptable to either leave point-to-point links un- numbered or to use an entire subnet for each point-to-point link? Will there need to be an exception for IPv6 host routes (i.e., /128s) as a work-around for the bootstrapping issue of addressing root DNS servers? If /128s are allowed, but not masks between /65 and /127, inclusive, then a possible way to number point-to-point links within a backbone is to dedicate a single subnet to them and route them as /128s. 9) Search for ways to minimize the impact that renumbering has on intra-site communication. Renumbering operations that change only the RG portion of addresses should not impact existing intra-site communication. One possible approach is to encourage the use of site-local addresses for all intra-site communication. 6. Security Considerations The primary security consideration with GSE or, more generally, a network layer with addresses split into locator and identifier parts, is that of one node impersonating another by copying the identification without the location. 7. Acknowledgments Thanks go to Steve Deering and Bob Hinden (the Chairs of the IPng Working Group) as well as Sun Microsystems (the host for the PAL1 meeting) for the planning and execution of the interim meeting. Thanks also goes to Mike O'Dell for writing the 8+8 and GSE drafts. By publishing these documents and speaking on their behalf, Mike was the catalyst for some very valuable discussions that are expected to result in improved IPv6 addressing. Special thanks to the attendees of the meeting who carried on the high caliber discussions which were the source for this document. 8. References [BATES] Scalable support for multi-homed multi-provider connectivity, Internet Draft, Tony Bates & Yakov Rekhter, draft-bates-multihoming-01.txt. draft-ietf-ipngwg-esd-analysis-01.txt [Page 49] INTERNET-DRAFT July 30, 1997 [Bellovin 89] "Security Problems in the TCP/IP Protocol Suite", Bellovin, Steve, Computer Communications Review, Vol. 19, No. 2, pp32-48, April 1989. [CERT] CERT(sm) Advisory CA-96.21 (ftp://info.cert.org/pub/cert_advisories) [DANVERS] Minutes of the IPNG working Group, April 1995. ftp://ftp.ietf.cnri.reston.va.us/ietf-online-proceedings/ 95apr/area.and.wg.reports/ipng/ipngwg/ ipngwg-minutes- 95apr.txt. [DHCP-DDNS] Interaction between DHCP and DNS, Internet Draft, Yakov Rekhtor, draft-ietf-dhc-dhcp-dns-04.txt. [DDNS] "Dynamic Updates in the Domain Name System (DNS UPDATE)", Paul Vixie (Editor), draft-ietf-dnsind-dynDNS-11.txt, November, 1996. [EUI64] 64-Bit Global Identifier Format Tutorial. http://standards.ieee.org/db/oui/tutorials/EUI64.html. Note: "EUI-64" is claimed as a trademark by an organization which also forbids reference to itself in association with that term in a standards document which is not their own, unless they have approved that reference. However, since this document is not standards-track, it seems safe to name that organization: the IEEE. [GSE] "GSE - An Alternate Addressing Architecture for IPv6", Mike O'Dell, draft-ietf-ipngwg-gseaddr-00.txt. [IEEE802] IEEE Std 802-1990, Local and Metropolitan Area Networks: IEEE Standard Overview and Architecture. [IEEE1212] IEEE Std 1212-1994, Information technology-- Microprocessor systems: Control and Status Registers (CSR) Architecture for microcomputer buses. [RFC1122] "Requirements for Internet hosts - communication layers", R. Braden, 10/01/1989. [RFC1715] The H Ratio for Address Assignment Efficiency. C. Huitema. [RFC1726] Technical Criteria for Choosing IP:The Next Generation (IPng). F. Kastenholz, C. Partridge. [RFC1752] "The Recommendation for the IP Next Generation Protocol," draft-ietf-ipngwg-esd-analysis-01.txt [Page 50] INTERNET-DRAFT July 30, 1997 S. Bradner, A. Mankin, 01/18/1995. [RFC1788] "ICMP Domain Name Messages", W. Simpson, 04/14/1995 [RFC1958] Architectural Principles of the Internet. B. Carpenter. [RFC1971] IPv6 Stateless Address Autoconfiguration. S. Thomson, T. Narten. [RFC2002] "IP Mobility Support", 10/22/1996, C. Perkins. [RFC2008] "Implications of Various Address Allocation Policies for Internet Routing", Y. Rekhter, T. Li. [RFC2065] Domain Name System Security Extensions. D. Eastlake, C. Kaufman. [RFC2073] An IPv6 Provider-Based Unicast Address Format. Y. Rekhter, P. Lothberg, R. Hinden, S. Deering, J. Postel 9. Authors' Addresses Matt Crawford John Stewart Fermilab MS 368 USC/ISI PO Box 500 4350 North Fairfax Drive Batavia, IL 60510 USA Suite 620 Phone: 708-840-3461 Arlington, VA 22203 USA EMail: crawdad@fnal.gov Phone: 703-807-0132 EMail: jstewart@isi.edu Allison Mankin Lixia Zhang USC/ISI UCLA Computer Science Department 4350 North Fairfax Drive 4531G Boelter Hall Suite 620 Los Angeles, CA 90095-1596 USA Arlington, VA 22203 USA Phone: 310-825-2695 EMail: mankin@isi.edu EMail: lixia@cs.ucla.edu Phone: 703-807-0132 Thomas Narten IBM Corporation 3039 Cornwallis Ave. PO Box 12195 - F11/502 Research Triangle Park, NC 27709-2195 Phone: 919-254-7798 EMail: narten@raleigh.ibm.com draft-ietf-ipngwg-esd-analysis-01.txt [Page 51] INTERNET-DRAFT July 30, 1997 draft-ietf-ipngwg-esd-analysis-01.txt [Page 52]