Network Working Group O. Maennel Internet-Draft T-Labs/TU-Berlin Intended status: Standards Track R. Bush Expires: May 8, 2009 Internet Initiative Japan L. Cittadini Universita' Roma Tre S. Bellovin Columbia University November 4, 2008 The A+P Approach to the Broadband Provider IPv4 Address Shortage draft-ymbk-aplusp-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 8, 2009. Abstract We are facing the exhaustion of the IANA IPv4 free IP address pool. Unfortunately, IPv6 is not yet deployed widely enough to fully replace IPv4, and it is unrealistic to expect that this is going to change before we run out of IPv4 addresses. Letting hosts seamlessly communicate in an IPv4-world without assigning a unique globally Maennel, et al. Expires May 8, 2009 [Page 1] Internet-Draft A+P Addressing Extension November 2008 routable IPv4 address to each of them is a challenging problem, for which many solutions have been proposed. Some prominent ones involve carrier-grade-NATs (CGN), which have been shown to provide an inadequate experience to IPv4 users and enshrine a walled garden in the core of the provider. Instead, we propose using specialized NATs at the consumer premises equipment (CPE) edge which treat some of the port number bits as part of an extended IPv4 address. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Why Carrier-Grade-NATs are Harmful . . . . . . . . . . . . 3 1.2. Security of CGNs . . . . . . . . . . . . . . . . . . . . . 5 2. Proposed Solution . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Design of the A+P Gateway Device . . . . . . . . . . . . . 6 2.3. Reasons for allowing multiple A+P gateways in sequence . . 9 2.4. Changes Required to the Network . . . . . . . . . . . . . 10 2.4.1. Changes Required to CPE . . . . . . . . . . . . . . . 10 2.4.2. Changes to Customer-Provided NAT (CN) . . . . . . . . 10 2.4.3. Changes to Provider-Edge Routers (PE) . . . . . . . . 11 2.4.4. Changes to Provider Border Routers (BR) . . . . . . . 11 2.4.5. Changes to Network Core Routers . . . . . . . . . . . 12 3. Implementation . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1. A+P dual-stack . . . . . . . . . . . . . . . . . . . . . . 12 3.2. IPv6 and mixed V4-V6 traffic . . . . . . . . . . . . . . . 18 3.3. Handling ICMP . . . . . . . . . . . . . . . . . . . . . . 18 3.4. Handling IP fragments . . . . . . . . . . . . . . . . . . 19 3.5. The incremental path to A+P . . . . . . . . . . . . . . . 20 4. Benefits and limitations of A+P . . . . . . . . . . . . . . . 20 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 6. Security Considerations . . . . . . . . . . . . . . . . . . . 22 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8.1. Normative References . . . . . . . . . . . . . . . . . . . 23 8.2. Informative References . . . . . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 Intellectual Property and Copyright Statements . . . . . . . . . . 27 Maennel, et al. Expires May 8, 2009 [Page 2] Internet-Draft A+P Addressing Extension November 2008 1. Introduction Many large Internet Service Providers (ISPs) face the problem that their networks' customer edges are so large that even if they only give the "front" of each customer premises equipment (CPE) a single IPv4 address, they need two to five /8s of IPv4 space. The looming exhaustion of the free IANA IPv4 pool makes it highly unlikely that they would be allocated that much public IPv4 address space. Therefore ISPs have to devise something more ingenious. Deploying NATs is a direct consequence of the design of a new protocol (IPv6) which is incompatible on the wire; there is not the slightest compatibility mode. Although undesirable, NATs are inevitable. Some broadband providers are testing an approach called Carrier Grade NAT (CGN). It is essentially a number of IPv4 NATs in the core of their networks and various tunneling and translation techniques. If the CPE has dual stack, traffic where source and destination is IPv6 would not have to be NATted, but IPv4 would be heavily NATted. We can contrast this to, for example, NAT-PT [RFC2766] [RFC4966] on the CPE, which would probably scale to the needs of even a large non- consumer backbone. But, as we noted above, very large broadband consumer providers would need far too much IPv4 space for the NAT-PT front ends for their large consumer networks. Our main concern is that the imminent IPv4 address exhaustion is tempting operators to deploy technology which is damaging to the Internet as a whole. 1.1. Why Carrier-Grade-NATs are Harmful We have taken up a desperate search for alternatives. The reasons are simple: "Carrier grade" is a euphemism for centralized. More semantics move to the core of the network. This is bad in and of itself. Net-heads call it "telco-think" because it is the telco model of smarts in the core as opposed to the Internet model of a simple, just-forward- packets core, with smart edges. It also places the provider in the position of a walled garden, where the user is trapped behind unchangeable application and policies, the opposite of the "end-to- end" model of the Internet. With the smarts at the edges, e.g. NAT-PT, one can easily field new protocols between consenting end-points by "just" tweaking the NATs at the corresponding CPE, even adding application layer gateways (ALGs) if they are needed. However, CGNs do not build an Internet walled garden at the edges, they build it by restricting the core. Maennel, et al. Expires May 8, 2009 [Page 3] Internet-Draft A+P Addressing Extension November 2008 With NAT in the core, if a customer wants a new application protocol which requires cooperation from the NAT, he gets to beg help from the broadband providers' engineers and lawyers, and all other users of carrier grade NATs. This is the ultimate horror the NAT-haters fear, and, in this case, they are not all that wrong. One broadband provider has recently received a lot of bad press for just this, though we know that the engineers are very far from those responsible. This shows that all new application protocols have to go through the carrier-loving lawyers to be allowed to be handled by the NATs in their core. Today's NATs are typically mitigated by ALGs over which the customer has some degree of control, e.g. port forwarding or UPnP. However, this is not expected to work anymore with CGNs. CGN proposals admit that it is not expected that applications that require specific port assignment or port mapping from the NAT box will keep working [I-D.durand-softwire-dual-stack-lite]. We believe this is not an option and that the end-user must have the ability to control its own ALGs. So, if someone wants to deploy a new application, they can talk to the broadband providers' lawyers or run new disruptive technology over HTTP; we can pick our poison. And if the NAT is not where the customer can directly control it, i.e., it is anywhere back in the provider's network, then the provider controls what the user can control, i.e. it is not really under user control. We do not wish to deal with the case where the provider has to decide whether to allow Skype v42 when they themselves provide a competing VoIP product. And remember that as IPv6 deploys, if we want to have one Internet, i.e. IPv4 nodes talking freely with IPv6 nodes, then translation must be done somewhere. The challenge is whether someone can figure out a scheme where it is done for these large networks? We believe it should be at the customer edge, not in the core. Another issue with CGNs is scalability. ISPs face a tension between the placement of CGNs within their network to aggregate as much as possible, when too much aggregation creates a massive state problem. To reduce the state, the placement ends up somewhere closer to the edge, where the benefits are somewhat limited. It is not clear how a CGN should maintain per-session state in a scalable manner. This is particularly relevant given that each customer is very likely to open many TCP connections in parallel [SP-NAT]. State for improperly terminated sessions could remain stale for some time. The CGN hence trades scalability for the amount of state that needs to be kept, which makes optimally placing a CGN a hard engineering problem. Maennel, et al. Expires May 8, 2009 [Page 4] Internet-Draft A+P Addressing Extension November 2008 With CGNs, tracing hackers, spammers and other criminals will be impossible, unless all the connection based mapping information is recorded and stored. This would not only cause concern for law enforcement services, but also for privacy advocates. We discuss other security-related problems with CGNs in the next section. 1.2. Security of CGNs NATs frequently need to initiate translation for secondary port numbers. This may be a decision based on packet inspection (i.e., looking for PORT commands in FTP [RFC0959] sessions), or it may rely on explicit signaling from the end host via protocols such as UPnP. Either way, CGNs pose a security threat and/or an administrative nightmare. The issue is proper authentication of such requests. Most UPnP devices do not implement appropriate security features. Even if they did, there would be no way to administer the security mechanism. Every end-user device would have to have a secret corresponding to some authentication field in the CGN. End users will not set these up properly; providers do not want to maintain such a database. Decisions made based on packet inspection are just as problematic. A request from one customer could easily request opening a port for an other customer's addresses, similar to the Java-based attack described by Martin et al in [Martin-Java]. 2. Proposed Solution The specific problem we are facing is that available IPv4 address space is insufficient to number the IPv4-speaking customers, while IPv6 is not widely enough deployed to migrate to an IPv6-only world. Therefore, we propose to extend the IPv4 address space by assigning to each customer a single IPv4 address which is extended by "stealing" bits from the port number in the TCP/UDP header, leaving the applications a reduced range of ports. In the face of IPv4 address exhaustion, the need for addresses is stronger than the need to be able to address thousands of applications on a single host [SP-NAT], and broadband consumers are not anticipated to deploy a massive number of applications over IPv4 (if they did, CGNs would be even more damaging than this "bit-stealing" proposal). Assuming we could limit the applications' port addressing to 8 (or 12) bits, we can increase the effective size of an IPv4 address by 8 (or 4) additional bits. In this scenario, 512 (or 16) customers could be multiplexed on the same IPv4 address, while allowing them a fixed range of 512 (or 4096) ports. We call this "extended addressing" or "A+P" (Address Plus Port) addressing. Various routing techniques can Maennel, et al. Expires May 8, 2009 [Page 5] Internet-Draft A+P Addressing Extension November 2008 be employed to route on A+P addresses. The main advantage of A+P is that it preserves the Internet "end-to-end" model by pushing the state on a device at the edge of the network, which we call A+P gateway. In a world where address translation is inevitable, due both to IPv6-IPv4 incompatibilities and lack of IPv4 space, we strive to give control over the translation process to the customer. In contrast to CGNs, the customer has potentially control over the A+P gateway directly, instead of using ad-hoc protocols to communicate with the CGN. This setting enables us to preserve a bit from the Internet "end-to-end" model, where now at least packets that leave one user's home gateway are guaranteed to be delivered without modifications to the destination home gateway. 2.1. Terminology In the rest of this draft, we will refer to the following network devices: 1. Customer Premises Equipment (CPE), i.e. cable/DSL modem. 2. Customer-Provided-NAT (CN), an A+P capable gateway which is under customer control (optional). 3. Provider Edge Router (PE), AKA customer aggregation router 4. Provider Border Router (BR), provider's edge to other providers 5. Network Core Routers (Core), provider routers not PE or BR 2.2. Design of the A+P Gateway Device In this section, we discuss our view of the delicate design choices for the A+P gateway device. As the customer's hosts would likely be unaware of the restricted range of ports and the extended A+P addressing scheme, translation would be done at the border between the customer and the provider. In the most common case, this is the provider-provisioned cable or DSL modem on the customer's premises, into which the customer plugs their single computer or LAN. This CPE has to be upgraded to A+P extended addressing and be informed of the port number range allocated to the customer. This latter could be done, for example, via an extension to DHCP (e.g., [I-D.boucadair-dhc-port-range], [I-D.bajko-v6ops-port-restricted-ipaddr-assign]); or via a new protocol. The CPE would also provide the A+P gateway function between the customer's LAN and the provider. As we do not wish to modify end hosts, we expect customers to send Maennel, et al. Expires May 8, 2009 [Page 6] Internet-Draft A+P Addressing Extension November 2008 IPv4 packets from any port(s). The A+P gateway functionality consists of translating/NATting port numbers outside the assigned port-range to ensure that they fall in the appropriate range. Packets originated from within the appropriate port-range will pass- through without re-translation. We regard several constraints as important for our design: 1) Customer devices, such as computers and PDAs, must work without modification. A+P shall be transparent to unaware end-users. Emergence of new applications shall not be limited. 2) Customers must have the ability to configure the A+P gateway to fit their needs (e.g., packets from one home gateway will be delivered without modifications to the destination home gateway). 3) No state should be kept inside the ISP's network. This implies that the A+P gateway functionality should be provided either by the CPE or by the CN. 4) Automatic configuration/administration must be supported. There should be no need for customers to call the ISP and tell them that they are operating their own A+P-aware devices. 5) Multiple A+P gateways should be able to operate in sequence along one data path without interfering with each other. 6) "Double-NAT" has to be avoided. Based on constraint 5 multiple A+P-aware devices might be present in a path, and once one has done some translation, those packets should not be re-translated. 7) IPv6 deployment should be encouraged. While we acknowledge that A+P works in an IPv4-only environment (in a way similar to [I-D.boucadair-port-range]) we strongly believe that IPv6 is the long-term solution to the problem, and that A+P should be considered only as a smooth transition towards an IPv6 world. We therefore assume in constraint 7 that the ISP has migrated to an IPv6-only or dual-stack core and A+P can use IPv6 as a transport inside the network. This ensures that A+P will not be an hindrance to the introduction of IPv6. These principles lead us to the following design: Maennel, et al. Expires May 8, 2009 [Page 7] Internet-Draft A+P Addressing Extension November 2008 1) The A+P gateway should automatically translate any packet that does not come from the proper port range (Constraint 1). Conversely, port numbers within the proper range should not be translated. The latter choice ensures that, in presence of multiple A+P-aware devices, we do not re-translate. (Constraints 4, 5, and 6) 2) The customer needs either to have an administrative login on the CPE, and/or must be able to operate their own A+P-aware device (with a "public" A+P address, NOT a private [RFC1918] address). (Constraint 2, and 5) 3) Packet encapsulation should be done on the first A+P gateway, if multiple gateways are presents (otherwise the CPE) and if A+P-in-v6 transport is used in the network (Constraints 3, 5, and 7). This could be automatically signaled, for example via DHCP (Constraint 4). 4) The translation performed by a provider-supplied CPE should be as accommodating as possible. Packet snooping and/or ALGs may be added for well known protocols (e.g., FTP, SIP, Skype), but the translation for obscure/unknown protocols (e.g., gaming) might have to be manually configured by the customer. (Constraints 1, 2, and 5) 5) If neither CPE nor CN is A+P capable, an unrestricted IPv4 address shall be provided. (This is to facilitate migration and fulfill constraints 1, 3, and 4). To avoid exploitation of this principle (e.g. a customer operating an "old"/legacy CPE just to get an unrestricted address), an ISP could choose to provide no-IPv4 (i.e., only IPv6) service. Note that this requires that the ISP learns about A+P capable devices (either CPE or CN). This implies that DHCP needs to be extended to communicate A+P capability in the request message. We leave unspecified for now the question of how large a port number range is allocated to each customer. We anticipate that the allocation available to a customer will be determined by ISP-specific policy, perhaps as a function of the fee charged to the customer. One possible solution could be to allocate ranges of IPs with a static assigned port range. For example, the ISP could offer "classes of service", e.g., the first block of IPs offers a port- range of 4096 ports, the second class offers 512 ports, the third class offers 16 ports. If the customer wants more ports, the address needs to be moved into a different class. Obviously, this does not go without a service interruption for this particular customer (i.e., the customer has to get a new IP). However, this solves the problem Maennel, et al. Expires May 8, 2009 [Page 8] Internet-Draft A+P Addressing Extension November 2008 of dynamic allocation for the ISP. We leave details of this issue for future work. 2.3. Reasons for allowing multiple A+P gateways in sequence There are many known difficulties with NATs in general. Most are related to NATs breaking the end-to-end principle of the Internet. Some applications, such as gaming or peer-to-peer, are known to have difficulties if some kind of address/port translation is used. This behavior is independent of where the translation takes place, thus privately operated NATs can be considered to be as limiting as CGNs. There is one major difference: today's work-around is that the user owns and controlls the NAT and is typically able to alter some of the translation properties, for example by defining their own port forwarding rules. The main criticisms of CGNs is that this "work- around" is not guaranteed to be an option for the end-user. However, A+P could evolve beyond this limitation of NATs and actually re-establish real end-to-end connectivity between end-devices. Section Section 3.1 shows that what is needed to achieve this is to allow multiple A+P gateways to operate in a sequence. Hence the constraints 5 and 6. The key observation is that the A+P gateway could be given a globally routable IPv4 address, though restricted in the usable port-range. For an end-user this could mean that, enabling an A+P-gateway on an end-system could be as easy as installing a "kernel-patch". In this case the end-system would be capable of establishing end-to-end connectivity with other IPv4 speakers in the Internet and it would be possible to contact the end-host on those assigned ports. This obviously poses a security threat to the end-system in a similar way as connecting a legacy host to the Internet with a publicly routable IPv4 address. (Note that such as system could be placed behind an application firewall.) Our main goal is that the A+P- design should allow for two possible usage scenarios: 1) Not upgraded legacy end-devices receive a [RFC1918] private address and work as today behind a NAT (e.g., the CPE). The CPE would have to be upgraded to A+P extended addressing. 2) Upgraded dual-stacked systems would understand A+P extended addressing and thus could provide A+P gateway functionality themselves. Therefore they would receive a globally routable (though port restricted) A+P address. For those kind of systems, packets could be delivered unmodified end-to-end, hence overcoming some of the general limitations of today's NATs. Maennel, et al. Expires May 8, 2009 [Page 9] Internet-Draft A+P Addressing Extension November 2008 As section Section 3.1 explains in more detail, this is achieved by A+P-in-IPv6 encapsulation similarly to already existing approaches (such as NAT-PT). This way A+P is capable of routing on IPv6 and thus bypassing the limitations of NATs. One open issue for future work is the choice of mechanism to sub-assign multiple parallel A+P gateways. For the moment, assume that the CPE allocates sub-port ranges to subsequent end-devices. (In a very similar way as a NAT allocates addresses from some private address space to end-hosts; with the difference that the address space to be allocated is globally routable IPv4 addresses, which are restricted in the port- range.) 2.4. Changes Required to the Network 2.4.1. Changes Required to CPE Our design, described above, requires modifications to the CPE. However, modifications are also required by current CGN designs, for example [I-D.durand-softwire-dual-stack-lite] says, "It is expected that the home gateway is either software upgradable, replaceable or provided by the service provider as part of a new contract." An A+P gateway CPE would be configured, hopefully automatically, with o IPv4 and/or IPv6 addressing for the customer's LAN o The IPv4 A+P extended address for the WAN side to connect to the provider, which includes the range of port number to use on the WAN side, and o an IPv6 address for the WAN side to connect to the provider, which includes "instructions" how to encapsulate A+P-packets within IPv6. 2.4.2. Changes to Customer-Provided NAT (CN) Alternatively, as occasionally happens today, the customer could provide its own A+P gateway and the CPE would then function as a simple cable/DSL modem. This customer A+P gateway would be configured with an IPv4 A+P extended address which is allocated to the customer (e.g., via extended DHCP). The customer NAT is entirely optional. The customer does not have to operate such a device. If he does not, then the provider-installed CPE performs the A+P gateway function. The CN is simply a symbol for customer control. Therefore, a mixture of CPE and CN devices is also possible, where the customer gets full Maennel, et al. Expires May 8, 2009 [Page 10] Internet-Draft A+P Addressing Extension November 2008 control over the CPE via an administrative login. However, this could also range as far as computers with a special kernel patch to become A+P-aware. Therefore, we denote in this draft CPE/CN as an A+P-aware device that is under full customer control and has an A+P extended address. 2.4.3. Changes to Provider-Edge Routers (PE) Ultimately, we expect that all CPE/CNs take the functionality of the A+P gateway, as we would like to avoid state in the network. Therefore, the provider's customer aggregation router (aka PE) performs only some optional security-related functions, i.e., assuring that a CPE/CN does not send packets from ports other than the allocated range, as the replies in turn, would then go back to some other hosts. This is a comparable threat to IP source address spoofing. Ideally, want to enforce the analog of BCP 38 [BCP38]. This means that no packets outside of the assigned address and port number range should ever leave the PE for the network. We acknowledge that the PE router could also provide the A+P gateway functionality if the CPE/CN is not A+P capable. Unfortunately, this comes very close to the walled garden effect that a CGN would cause. For this reason, we suggest that legacy CPEs shall be assigned either an unrestricted IPv4 address or no IPv4 service at all (by design principle 5 above). However, even if the PE provides the A+P gateway functionality, there is one important difference with respect to CGNs: customers who wish to "escape" from the walled garden can run their own upgraded CN. This way customers can become aware of which ports will be A+P NATted and which will not, so they have control over their own applications with no need to interact with the ISP (e.g., there's no need for UPnP equivalents on the PE). 2.4.4. Changes to Provider Border Routers (BR) Routers at the provider's edge which face other providers need to be aware of the extended A+P IPv4 addresses. They must have the ability to forward packets to the corresponding CPE based on IPv4 address and port. We suggest that the provider network use IPv6 as the tunneling mechanism. The CPE/CN would encapsulate the A+P extended address within an IPv6 address using a well-known IPv6 prefix. Then the core would route on the IPv6 address. The border routers would recognize the well-known IPv6 prefix, de-capsulate the inner IPv4 packet, and normally route on the IPv4 address. Return or inbound packets would be encapsulated in a similar fashion and thus correctly delivered to the CPE/CN. Thus the provider's network could be IPv6 only, or any other layer 3/2.5 protocol. Maennel, et al. Expires May 8, 2009 [Page 11] Internet-Draft A+P Addressing Extension November 2008 2.4.5. Changes to Network Core Routers If transport through the provider is chosen appropriately, e.g. A+P- in-IPv6-encapsulation, then the network's core routers need no understanding of A+P extended IPv4 addressing at all. Routing through the core without some form of tunneling would require the deployment of IPv4-A+P all the way to the PE routers. As the original problem was insufficient IPv4 space, we assume that IPv6 or other non-IPv4 tunneling will be used. However, while we recommend IPv6, we acknowledge that A+P is the natural extension of IPv4, and should work seamlessly. In an IPv4- only (or dual-stacked) network, we propose to host only unsplit/full IPv4 addresses on the PE. In this case no modifications to the core have to be done to allow routing of /32-or-longer prefixes and forwarding will work with legacy equipment. Only the PE would have to be upgraded to A+P-awareness and then make A+P decisions. 3. Implementation 3.1. A+P dual-stack There is wide consensus that the only long term solution to the IPv4 address shortage is speeding up the deployment of IPv6. Hence, we argue that the main design requirement for any short term solution is that it ease, or at least not hamper, ISP-wide IPv6 deployment. A+P addressing enables ISPs to run an IPv6-only core with dual-stack devices at the edge. In fact, the CPE/CN and the BR are the only devices that need to support dual-stack. A+P addressing requires that those devices get an IPv6 addresses assigned that belongs to an ISP-wide well-known prefix (WKP), which only needs to be routable within the ISP. The CPE/CN learns both WKP and its A+P address plus port range, and configures its WAN interface accordingly. Figure 1 shows an example of how WKP and A+P are combined to obtain an IPv6 address at the CPE/CN. Maennel, et al. Expires May 8, 2009 [Page 12] Internet-Draft A+P Addressing Extension November 2008 Configuration (e.g., from DHCP): -------------------------------- WKP = 4999::/64 (64 bits) A = 12.0.0.1 (32 bits) P = ports 4096 to 8191 Port bits usage: -------------------------------- P = Pa + Pp (16 bit port field in TCP header) Pa = address extension (4 bits) Pp = restricted port number (12 bits) from 0001000000000000 (4096) to 0001111111111111 (8191) \__/\__________/ / \ / \ +-------------+ +-------------+ | part of A+P | | bits for | | address | | port number | | (4 bits) | | (12 bits) | +-------------+ +-------------+ IPv6 prefix: -------------------------------- 4999:0:0:0 : 0c00:0001 : 1000 :: /100 \________/ \___________/ \__/ WKP A+P address (64+32+4 bits) Building an IPv6 prefix from Well Known Prefix and A+P address Figure 1 This address is routed within the provider's core network. We expect that A+P-in-IPv6 addresses are highly aggregatable, so that the resulting prefixes do not contribute to large routing tables, but can be announced with very little impact on the overall routing table size in the ISP core network. We now describe how a packet is transported from an end-user behind an A+P gateway towards the IPv4-Internet, and then the opposite direction. In the following examples, we assume that the end-user host is not A+P-aware (e.g., via kernel patches). Hence, packets have to be NATted at the CPE/CN. The CPE/CN receives an IPv4 packet from the end-user device to a destination address V4D. If private IPv4 address space [RFC1918] is used it NATs. If the packet already originated from the assigned IPv4 address, it ensures that the source port falls into the allocated port range, and then encapsulates the Maennel, et al. Expires May 8, 2009 [Page 13] Internet-Draft A+P Addressing Extension November 2008 packet in an IPv6 packet where the source address is WKP+A+P, and the destination address is WKP+V4D. We assume that the NAT is operating on the 4-tuple (source_IPv4, source_port, destination_IPv4, destination_port). (Using the terminology of [RFC3489], this is mostly a "symmetric" NAT; see the discussion of statelessness in the BR, below, for the exception.) The packet is then sent using standard routing in the ISP core up to the provider's BR. Note that there is no preconfigured tunnel between the CPE/CN and the BR; the packet is routed based on the destination address, rather than to a predetermined endpoint. When the BR receives the packet, it de-capsulates the IPv4 packet where the source is A and the destination is V4D. Figure 2 shows routing of outgoing packets. Observe that if the source port does not initially fall in the configured range (datagram 1), it is translated at the CPE/CN (datagram 2). Maennel, et al. Expires May 8, 2009 [Page 14] Internet-Draft A+P Addressing Extension November 2008 +-----------+ | Host | +-----+-----+ | |12.0.0.1 (ports 4096 to 8191) IPv4 datagram 1 | | | | v | +---------|---------+ |CPE/CN | | +--------|||--------+ | |||4999:0:0:0:0c00:0001:1000::/100 IPv6 datagram 2| ||| | |||<-IPv4-in-IPv6 | ||| -----|-|||------- / | ||| \ | ISP network | \ | ||| / -----|-|||------- | ||| v ||| +--------|||--------+ |BR ||| | +---------|---------+ | | IPv4 datagram 3 | | -----|--|-------- / | | \ | Internet | \ | | / -----|--|-------- | | v |128.0.0.1 +-----+-----+ | IPv4 Host | +-----------+ Figure 2: Routing of Outgoing Packets Maennel, et al. Expires May 8, 2009 [Page 15] Internet-Draft A+P Addressing Extension November 2008 +-----------------+--------------+-----------------------------+ | Datagram | Header field | Contents | +-----------------+--------------+-----------------------------+ | IPv4 datagram 1 | IPv4 Dst | 128.0.0.1 | | | IPv4 Src | 12.0.0.1 | | | TCP Dst | 80 | | | TCP Src | 32000 | | --------------- | ------------ | --------------------------- | | IPv6 Datagram 2 | IPv6 Dst | 4999:0:0:0:128.0.0.1:: | | | IPv6 Src | 4999:0:0:0:0c00:0001:1001:: | | | IPv4 Dst | 128.0.0.1 | | | IPv4 Src | 12.0.0.1 | | | TCP Dst | 80 | | | TCP Src | 4097 | | --------------- | ------------ | --------------------------- | | IPv4 datagram 3 | IPv4 Dst | 128.0.0.1 | | | IPv4 Src | 12.0.0.1 | | | TCP Dst | 80 | | | TCP Src | 4097 | +-----------------+--------------+-----------------------------+ Datagram header contents An incoming packet undergoes the reverse process. When a BR receives an IPv4 packet on an external interface, it extracts the address and port and then uses that information to build a WKP+A+P IPv6 destination address. The packet is then routed in the ISP core to the user's A+P gateway, which is then able to de-capsulate the IPv4 packet or reverse the applied NAT mapping. Note that the packet processing at the BR is completely stateless, since there is no need to know how many bits of the port are "stolen" by the address. The longest prefix rule will just deliver the packet to the corresponding A+P gateway. All the state is kept on the CPE/CN, i.e., at the edge. Figure 3 shows how an incoming packet is routed. Observe that the port translation at the CPE/CN (datagram 3) only happens if the CPE/CN has a preexisting mapping. Otherwise, the port number is left untouched. Overall, this approach brings two major advantages over CGNs: (i) there are no scalability issues, and (ii) it allows a customer to be contacted on the restricted port range with no extra signaling. (This eliminates the filtering implicitly required for "symmetric" NATs in [RFC3489].) Maennel, et al. Expires May 8, 2009 [Page 16] Internet-Draft A+P Addressing Extension November 2008 +-----------+ | Host | +-----+-----+ ^ |12.0.0.1 (ports 4096 to 8191) IPv4 datagram 3 | | | | | | +---------|---------+ |CPE/CN | | +--------|||--------+ ^ |||4999:0:0:0:0c00:0001:1000::/100 IPv6 datagram 2| ||| | |||<-IPv4-in-IPv6 | ||| -----|-|||------- / | ||| \ | ISP network | \ | ||| / -----|-|||------- | ||| | ||| +--------|||--------+ |BR ||| | +---------|---------+ ^ | IPv4 datagram 1 | | -----|--|-------- / | | \ | Internet | \ | | / -----|--|-------- | | | |128.0.0.1 +-----+-----+ | IPv4 Host | +-----------+ Figure 3: Routing of Incoming Packets Maennel, et al. Expires May 8, 2009 [Page 17] Internet-Draft A+P Addressing Extension November 2008 +-----------------+--------------+-----------------------------+ | Datagram | Header field | Contents | +-----------------+--------------+-----------------------------+ | IPv4 datagram 1 | IPv4 Dst | 12.0.0.1 | | | IPv4 Src | 128.0.0.1 | | | TCP Dst | 4097 | | | TCP Src | 80 | | --------------- | ------------ | --------------------------- | | IPv6 Datagram 2 | IPv6 Dst | 4999:0:0:0:0c00:0001:1001:: | | | IPv6 Src | 4999:0:0:0:128.0.0.1:: | | | IPv4 Dst | 12.0.0.1 | | | IP Src | 128.0.0.1 | | | TCP Dst | 4097 | | | TCP Src | 80 | | --------------- | ------------ | --------------------------- | | IPv4 datagram 3 | IPv4 Dst | 12.0.0.1 | | | IPv4 Src | 128.0.0.1 | | | TCP Dst | 32000 | | | TCP Src | 80 | +-----------------+--------------+-----------------------------+ Datagram header contents 3.2. IPv6 and mixed V4-V6 traffic Note that if IPv4/IPv6 dual stack is provided on the customer's LAN, IPv6 to IPv6 destinations would be transported untranslated from the customer's host to the provider's border with other providers. If the customer has an IPv6-only LAN, then the A+P gateway providing translation should also provide NAT-PT service so that the customer could communicate with the IPv4 Internet. 3.3. Handling ICMP ICMP is problematic for all NATs, because it lacks port numbers. A+P routing exacerbates the problem. Most ICMP messages fall into one of two categories: error reports, or ECHO/ECHO reply (commonly known as "ping"). For error reports, the offending packet header is embedded within the ICMP packet; NAT devices can then rewrite that portion and route the packet to the actual destination host. This functionality will remain the same with A+P; however, the provider's BR will need to examine the embedded header to extract the port number, while the A+P gateway will do the necessary rewriting. ECHO and ECHO reply are more problematic. For ECHO, the A+P gateway Maennel, et al. Expires May 8, 2009 [Page 18] Internet-Draft A+P Addressing Extension November 2008 device must rewrite the "Identifier" and perhaps "Sequence Number" fields in the ICMP request, treating them as if they were port numbers. This way, the BR can build the correct A+P address for the returning ECHO replies, so they can be correctly routed back to the appropriate host in the same way as TCP/UDP packets. (We leave pings originated from an external domain/legacy Internet towards an A+P device for future work.) 3.4. Handling IP fragments Much like ICMP packets, IP fragmented packets are known to be hard to handle in any address translation mechanism [RFC3022]. In fact, only the first IP fragment contains the TCP (UDP) header. This issue is commonly dealt with by keeping additional state at the NAT device to allow fragments to be mapped to the correct TCP (UDP) session. In the A+P gateway solution, fragments coming from the internal domain can be avoided if the core network runs IPv6 only and the PE ensures that no layer-3 fragmentation is performed by the customer equipment. Fragments coming from the external domain are harder to handle. Commercial NATs extract the port number out of the first fragment and keep that information to map subsequent fragments. Moreover, when the first fragment is not the first one to be received at the NAT, the fragment needs to be stored until the port number is known [CCIE-Pro]. Note that a deployment scenario which intends to handle fragments must ensure that all the fragments of the same original IP packet arrive at the same fragment handling host. We propose to route fragments to special boxes by exploiting the prefix combination in a similar way to Figure 1. The BR is able to detect that a packet is fragmented when it receives it; in that case, it uses a different well-known prefix which is intended for fragments only (we call it WKPF). Hence the BR builds an IPv6 packet where the destination address is WKPF+A and then uses normal routing. Fragments are then routed to a special box which we call "fragment handler" (FH). The FH is in charge of keeping track of the port numbers used by each fragment. Namely, upon receiving the first fragment, the FH stores a mapping --> (8 bytes in total), which it uses to build the correct WKP+A+P address for all the fragments of the same IP packet (identified by the pair ). After storing such a mapping, all subsequent fragments can be forwarded to the correct A+P destination address. This way, fragment storage is only required for out-of-order fragments, until the fragment carrying the port number is received. Since out-of-order packets are pretty rare, the FH is not expected to buffer an high number of fragments. (If it runs out of space, perhaps because of a resource exhaustion attack, it can always discard older fragments.) Observe that a CGN also needs to remember the dst_ip information, since it cannot trust the dst_ip in the Maennel, et al. Expires May 8, 2009 [Page 19] Internet-Draft A+P Addressing Extension November 2008 packet itself. In this case, each entry in the mapping takes 12 bytes instead of 8. Finally, handling fragments via a specific prefix gives the network operator the flexibility to deploy multiple FHs. There are two limit cases: on one hand, a single FH that handles all the fragments in the network (the FH then announces WKPF); on the other hand, a FH for each destination IP (the FH then announces WKPF+A). Again, the longest prefix matching rule gives the ISP the autonomy to choose any intermediate point in between. 3.5. The incremental path to A+P In this section we will discuss one possibility for large networks to deploy A+P incrementally. As discussed above, the A+P scheme requires changes to the CPE, the BR, and (optionally) the PE. Changes to the routing system include the addition of the WKP and WKPF. The upgrade of the BR, as well as routing the WKP/WKPF have to be done before the first customers transition to A+P. In addition, it is possible to provide the A+P gateway function at the PE routers while gradually upgrading the CPEs. (We stress here once again that as soon as the PE is upgraded and A+P is activated, the customer must be able to operate its own CN, if he or she so desires.) One important consideration has not been described thus far: the BR mentioned in this document is essentially the BR of the A+P part of the network, and does not necessarily have to be the border router of the ISP. In this sense it might be possible to upgrade a smaller but contiguous part of a larger network, as long as it supports dual- stack. However, care needs to be taken that all routers (BR) that might form the boundary of the "upgraded cloud", are upgraded to A+P. In this case, those routers translate "A+P packets" into "legacy IPv4 packets" and vice versa. A+P clouds can be independently deployed within the ISP network; the only constraint that needs to be satisfied is that the A+P address space does not overlap with the IPv4 address space which still serves legacy CPEs. As the A+P deployment speeds up, small clouds can be easily merged into bigger ones, leading the way to the ultimate goal of a single, ISP-wide A+P cloud. For instance, a deployment plan could be to install A+P clouds at some neighboring PoPs, then merge them at the state level, and so on. 4. Benefits and limitations of A+P A+P addressing leverages internal routing in the ISP to route packets on extended addresses in a stateless manner. This allows customers Maennel, et al. Expires May 8, 2009 [Page 20] Internet-Draft A+P Addressing Extension November 2008 to be assigned globally routable addresses and to accept incoming connections on their A+P port range. Observe that the statefulness of NATs hampers this desirable feature, and forces users to use out- of-band signaling (e.g., UPnP). From the perspective of the ISP, on the other hand, A+P statelessness usually means lower deployment costs and less scalability issues with respect to stateful approaches like NAT. Moreover, A+P allows ISP to fine-tune their network via standard internal routing management, without adding an extra layer of complexity (e.g., point-to-point tunnels). We now discuss the limitations of the A+P approach. Recall that a transport session is identified by a 5-tuple Hence, any mechanism that shares the same IP address among multiple hosts intrinsically poses limitations on the number of active transport sessions that a single host can maintain. Observe that connections with different hosts (or even different applications on the same host) are only minimally impacted, because they can be differentiated by means of the dst_ip (dst_port) field. Therefore, the only case in which address sharing causes troubles is multiple outbound transport sessions with the same remote host and the same port. In fact, in this case only the src_port field can be used to differentiate; however that field can not be fully exploited, since it is also used to multiplex multiple users on the same IP address. While multiple sessions with the same remote application are not a widespread practice, some very popular websites (e.g., GoogleMaps and iTunes) have been reported to use very large numbers of TCP/IP connections to maximize parallelism. The current estimate of the number of parallel sessions used by those websites is circa 70 [I-D.durand-softwire-dual-stack-lite]. In this respect, A+P with 8 port bits would allow every host to maintain up to 256 parallel connections with the same remote process, while still providing 256 times more addresses for end hosts. Another limitation that A+P shares with any other IP address-sharing mechanism is the availability of well-known ports. In fact, services run by customers that share the same IP address will be distinguished by the port number. As a consequence, it will be impossible for two customers who share the same IP address to run services on the same port (e.g., port 80). Unfortunately, working around this limitation implies application-specific hacks (e.g., HTTP and HTTPS virtual hosting), discussion of which is out of the scope of this document. Of course, a provider might charge more for giving a customer the 'normal' port range, 0..N, thus allowing the customer to provide externally available services at 'normal' ports. Observe that some popular applications (e.g., BitTorrent) require the availability of Maennel, et al. Expires May 8, 2009 [Page 21] Internet-Draft A+P Addressing Extension November 2008 well known ports. However, those applications can easily adapt to work with different ports, as users of such tools update them frequently (e.g., to gain new features). 5. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC. 6. Security Considerations The primary security issue any time a NAT is mentioned is the implicit firewall provided by a NAT. Any proposal to eliminate NATs raises the specter of insecure hosts lying naked before a hostile Internet. For a number of reasons, we do not think this is a serious issue here. In fact, under certain assumptions our A+P scheme is more secure than CGN. A NAT owned by a customer, whether a home consumer or a large enterprise, is under the control of that customer. All machines on the customer's side of the NAT have unfettered access to each other machines on the same; generally, this is what is desired. A+P NATs do not change that. CGNs do not change the access property, either. However, with a CGN there are *many* machines on the inside of the translation, not all of which are in the customer's administrative domain. Unless other firewall mechanisms are employed, CGNs create added risk of unauthorized access. By contrast, the protection scope of an A+P NAT is, by definition, at the boundary to the customer network. The access properties are thus precisely what traditional NATs have provided. There is one notable exception to this point. As discussed in Section 3.1, inbound packets addressed to the assigned port number range are passed through unchanged, even if no outbound packets were sent to the originator. While this allows customers to run their own servers on certain ports, it also allows attackers to probe these servers without the protection provided today by provider-supplied NAT boxes. The issue is not that internal machines are addressable -- that is an inevitable corollary to servers being run -- but that it may represent a change from today's behavior. Furthermore, the effect on the customer varies greatly, depending on what port number range they are assigned; someone who is assigned 0-4K derives more benefit and runs more risk than someone who is assigned 48K-52K, Maennel, et al. Expires May 8, 2009 [Page 22] Internet-Draft A+P Addressing Extension November 2008 since the latter is in the IANA-assigned dynamic port range. A useful middle ground would be provision of a customer-controllable switch in the CPE that controls what happens to such packets. If filtering is to be done, state must be kept, which might be costly; this suggests that perhaps it should only be done in the CPE if it is replacing current CPE that provides NAT functionality. If customers have their own CN, they have the option of buying one with or without such a feature, according to their own needs. Note that regardless of the existence of such an option, the CPE/CN will need customer-controllable port number-mapping capability, since most customers will not be assigned a range that corresponds to the servers they wish to run. An unrelated risk is resource consumption in the FH. As noted, stored fragments can be discarded as needed. Most likely, some sort of fairness scheme should be employed, so that large numbers of fragments arriving for one customer do not impact other customers' ability to receive fragmented packets. There does not appear to be any risk if WKP or WKPF are announced outside of the provider's network. Assume that an attacker knows them. He or she could could send directly to addresses in those ranges. For WKP, this is probably harmless; one could reach the same box more simply by just sending to its IPv4 address. For WKPF addresses, this might be a way to send extra traffic to the FHs; there wouldn't seem to be any particular benefit to the attacker compared with simply sending fragments, however. 7. Acknowledgments The authors wish to thank David Ward for review, endless constructive criticism, and interminable questions, and Cullen Jennings for discussion and review of fragmentation. We would also like to thank the following persons for their valuable feedback on earlier versions of this work: Bernhard Ager, Rob Austein, Alain Durand, Dino Farinacci, Hamed Haddadi, Russ Housley, Wolfgang Muehlbauear, Steve Uhlig and Ruediger Volk. 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Maennel, et al. Expires May 8, 2009 [Page 23] Internet-Draft A+P Addressing Extension November 2008 8.2. Informative References [BCP38] Ferguson, P. and D. Senie, "Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing", BCP 38, May 2000. [CCIE-Pro] Doyle, J., "Routing TCP/IP Volume I (CCIE Professional Development)", 1998. [I-D.bajko-v6ops-port-restricted-ipaddr-assign] Bajko, G. and T. Savolainen, "Port Restricted IP Address Assignment", draft-bajko-v6ops-port-restricted-ipaddr-assign-01 (work in progress), October 2008. [I-D.boucadair-dhc-port-range] Boucadair, M., Grimault, J., Levis, P., and A. Villefranque, "DHCP Options for Conveying Port Mask and Port Range Router IP Address", draft-boucadair-dhc-port-range-01 (work in progress), October 2008. [I-D.boucadair-port-range] Boucadair, M., Grimault, J., Levis, P., and A. Villefranque, "Provider-Provisioned CPE: IPv4 Connectivity Access in the context of IPv4 address exhaustion", draft-boucadair-port-range-00 (work in progress), October 2008. [I-D.durand-softwire-dual-stack-lite] Durand, A., Droms, R., Haberman, B., and J. Woodyatt, "Dual-stack lite broadband deployments post IPv4 exhaustion", draft-durand-softwire-dual-stack-lite-00 (work in progress), September 2008. [Martin-Java] Martin, D., Rajagopalan, S., and A. Rubin, "Blocking Java Applets at the Firewall", Proceedings of the Internet Society Symposium on Network and Distributed System Security, pp. 16-26, 1997. [RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9, RFC 959, October 1985. [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. Lear, "Address Allocation for Private Internets", BCP 5, RFC 1918, February 1996. Maennel, et al. Expires May 8, 2009 [Page 24] Internet-Draft A+P Addressing Extension November 2008 [RFC2766] Tsirtsis, G. and P. Srisuresh, "Network Address Translation - Protocol Translation (NAT-PT)", RFC 2766, February 2000. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [RFC3489] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy, "STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)", RFC 3489, March 2003. [RFC4966] Aoun, C. and E. Davies, "Reasons to Move the Network Address Translator - Protocol Translator (NAT-PT) to Historic Status", RFC 4966, July 2007. [SP-NAT] Alcock, S., Nelson, R., and D. Miles, "Characterizing the Network Connection Behavior of Residential Broadband Subscribers", draft, under-submission , 2009. Authors' Addresses Olaf Maennel T-Labs/TU-Berlin Ernst-Reuter-Platz 7 Berlin 10587 Germany Phone: +491607199931 Email: olaf@maennel.net Randy Bush Internet Initiative Japan 5147 Crystal Springs Bainbridge Island, Washington 98110 US Phone: +1 206 780 0431 x1 Email: randy@psg.com Maennel, et al. Expires May 8, 2009 [Page 25] Internet-Draft A+P Addressing Extension November 2008 Luca Cittadini Universita' Roma Tre via della Vasca Navale, 79 Rome, 00146 Italy Phone: +39 06 5733 3215 Email: luca.cittadini@gmail.com Steven M. Bellovin Columbia University 1214 Amsterdam Avenue MC 0401 New York, NY 10027 US Phone: +1 212 939 7149 Email: bellovin@acm.org Maennel, et al. Expires May 8, 2009 [Page 26] Internet-Draft A+P Addressing Extension November 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Maennel, et al. Expires May 8, 2009 [Page 27]