Network Working Group O. Maennel Internet-Draft T-Labs/TU-Berlin Intended status: Standards Track R. Bush Expires: April 30, 2009 Internet Initiative Japan L. Cittadini Universita' Roma Tre S. Bellovin Columbia University October 27, 2008 The A+P Approach to the Broadband Provider IPv4 Address Shortage draft-ymbk-aplusp-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 30, 2009. Abstract We are facing the exhaustion of the IANA IPv4 free IP address pool. Unfortunately, IPv6 is not yet deployed widely enough to fully replace IPv4, and it is unrealistic to expect that this is going to change before we run out of IPv4 addresses. Letting hosts seamlessly communicate in an IPv4-world without assigning a unique globally Maennel, et al. Expires April 30, 2009 [Page 1] Internet-Draft A+P Addressing Extension October 2008 routable IPv4 address to each of them is a challenging problem, for which many solutions have been proposed. Some prominent ones involve carrier-grade-NATs (CGN), which have been shown to provide an inadequate experience to IPv4 users and enshrine a walled garden in the core of the provider. Instead, we propose using specialized NATs at the consumer premises equipment (CPE) edge which treat some of the port number bits as part of an extended IPv4 address. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Why Carrier-Grade-NATs are Harmful . . . . . . . . . . . . 3 1.2. Security of CGNs . . . . . . . . . . . . . . . . . . . . . 5 2. Proposed Solution . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Changes Required to the Network . . . . . . . . . . . . . 6 2.1.1. Changes Required to CPE . . . . . . . . . . . . . . . 6 2.1.2. Changes to Customer-Provided NAT . . . . . . . . . . . 7 2.1.3. Changes to Provider-Edge Routers . . . . . . . . . . . 7 2.1.4. Changes to Provider Border Routers . . . . . . . . . . 7 2.1.5. Changes to Network Core Routers . . . . . . . . . . . 8 3. Implementation . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1. A+P dual-stack . . . . . . . . . . . . . . . . . . . . . . 8 3.2. Design of the A+P NAT Device . . . . . . . . . . . . . . . 14 3.3. IPv6 and mixed V4-V6 traffic . . . . . . . . . . . . . . . 16 3.4. Handling ICMP . . . . . . . . . . . . . . . . . . . . . . 16 3.5. Handling IP fragments . . . . . . . . . . . . . . . . . . 16 3.6. The incremental path to A+P . . . . . . . . . . . . . . . 17 4. Benefits and limitations of A+P . . . . . . . . . . . . . . . 18 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 6. Security Considerations . . . . . . . . . . . . . . . . . . . 19 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8.1. Normative References . . . . . . . . . . . . . . . . . . . 20 8.2. Informative References . . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 Intellectual Property and Copyright Statements . . . . . . . . . . 22 Maennel, et al. Expires April 30, 2009 [Page 2] Internet-Draft A+P Addressing Extension October 2008 1. Introduction Many large Internet Service Providers (ISPs) face the problem, that their networks' customer edges are so large that, even giving the 'front' of each customer premises equipment (CPE) only one single IPv4 address, they need two to five /8s of IPv4 space. The looming exhaustion of the free IANA IPv4 pool makes it highly unlikely that they would be allocated that much public IPv4 address space. Therefore ISPs have to devise something more ingenious. Deploying NATs is a direct consequence of the design of a new protocol (IPv6) which is incompatible on the wire, there is not the slightest compatibility mode. Although undesirable, NATs are inevitable. An approach which some broadband providers are testing is being called Carrier Grade NAT (CGN). It is essentially a number of IPv4 NATs in the core of their networks and various tunneling and translation techniques. If the CPE has dual stack, traffic where source and destination is IPv6 would not have to be NATted, but IPv4 would be heavily NATted. We can contrast this to, for example, NAT-PT [RFC2766] [RFC4966] on the CPE, which would probably scale to the needs of even a large non-consumer backbone. But, as we noted above, very large broadband consumer providers would need far too much IPv4 space for the NAT-PT front ends for their large consumer networks. Our main concern is that the imminent IPv4 address exhaustion is tempting operators to deploy technology which is damaging to the Internet as a whole. 1.1. Why Carrier-Grade-NATs are Harmful We have taken up a desperate search for alternatives. The reasons are simple: "Carrier grade" is a euphemism for centralized. More semantics move to the core of the network. This is bad in and of itself. Net-heads call it "telco-think" because it is the telco model of smarts in the core as opposed to the Internet model of a simple, just forward packets, core and smart edges. It also places the provider in the position of a walled garden, where the user is trapped behind unchangeable application and policies, the opposite of the "end to end" model of the Internet. With the smarts at the edges, e.g. NAT-PT, one can easily field new protocols between consenting end-points by just tweaking the NATs at the corresponding CPE, even adding application layer gateways (ALGs) if they are needed. However, CGNs do not build an Internet walled garden at the edges, they build it by restricting the core. Maennel, et al. Expires April 30, 2009 [Page 3] Internet-Draft A+P Addressing Extension October 2008 With NAT in the core, if a customer wants a new application protocol which requires cooperation from the NAT, he gets to beg help from the broadband providers' engineers and lawyers, and all other users of carrier grade NATs. This is the ultimate horror the NAT-haters fear, and, in this case, they are not all that wrong. One broadband provider has recently received a lot of bad press for just this, though we know that the engineers are very far from those responsible. This shows that all new application protocols have to go through the carrier loving lawyers to be allowed to be handled by the NATs in their core. Today's NATs are typically mitigated by ALG's of which the customer has some degree of control, e.g. port forwarding or UPnP. However, this is not expected to work anymore with CGN's. CGN proposals admit that it is not expected that applications that require specific port assignment or port mapping from the NAT box will keep working [I-D.durand-softwire-dual-stack-lite]. We believe this is not an option and that the end-user must have the ability to control their own ALGs. So, if someone wants to deploy a new application, they can talk to the broadband providers' lawyers or run new disruptive technology over HTTP, we pick our poison. And if the NAT is not where the customer can directly control it, i.e. it is anywhere back in the provider's network, then the provider controls what the user can control, i.e. it is not really under user control. We do not wish to deal with the case where the provider has to decide whether to allow Skype v42 when they themselves provide a competing VoIP product. And remember that, as IPv6 deploys, and we want to have one Internet, i.e. IPv4 nodes talking freely with IPv6 nodes, then translation must be done somewhere. The challenge is whether someone can figure out a scheme where it is done for these large networks? We believe it should be at the customer edge, not in the core. Another issue with CGNs is scalability. ISPs face a tension in between the placement of CGNs within their network to aggregate as much as possible and that too much aggregation creates a massive state problem. To reduce the state, the placement ends up somewhere closer to the edge, where the benefits are somewhat limited. It is not clear how a CGN should maintain per-session state in a scalable manner. This is particularly relevant given that each customer is very likely to open many TCP connections in parallel. State for improperly terminated sessions could remain stale for some time. The CGN hence trades scalability for the amount of state that needs to be kept, and this makes optimally placing a CGN a hard engineering problem. Maennel, et al. Expires April 30, 2009 [Page 4] Internet-Draft A+P Addressing Extension October 2008 With CGNs, tracing hackers, spammers and other criminals will be impossible, unless all the connection based mapping information is recorded and stored. This would cause not only concern for law enforcement services, but also for privacy advocates. Which brings us to the other security related problems with CGNs in the next section. 1.2. Security of CGNs NATs frequently need to initiate translation for secondary port numbers. This may be a decision based on packet inspection (i.e., looking for PORT commands in FTP [RFC0959] sessions), or it may rely on explicit signaling from the end host via protocols such as UPnP. Either way, CGNs pose a security threat and/or an administrative nightmare. The issue is proper authentication of such requests. Most UPnP devices do not implement appropriate security features. Even if they did, there would be no way to administer the security mechanism. Every end-user device would have to have a secret corresponding to some authentication field in the CGN. End users will not set these up properly; providers do not want to maintain such a database. Decisions made based on packet inspection are just as problematic. A request from one customer could easily request opening a port for an other customer's addresses, similar to the Java-based attack described by Martin et al in [Martin-Java]. 2. Proposed Solution The specific problem we are facing is that available IPv4 address space is insufficient to number the IPv4-speaking customers, while IPv6 is not widely enough deployed to migrate to an IPv6-only world. Therefore, we propose to extend the IPv4 address space by assigning to each customer a single IPv4 address which is extended by "stealing" bits from the port number in the TCP/UDP header, leaving the applications a reduced range of ports. In the face of IPv4 address exhaustion, the need for addresses is stronger than the need to be able to address thousands of applications on a single host [SP-NAT], and broadband consumers are not anticipated to deploy a massive number of applications over IPv4 (if they did, CGN would be even more damaging than this "bit-stealing" proposal). Assuming we could limit the applications' port addressing to 8 (or 12) bits, we can increase the effective size of an IPv4 address by 8 (or 4) additional bits. In this scenario, 512 (or 16) customers could be multiplexed on the same IPv4 address, while allowing them a fixed range of 512 (or 4096) ports. We call this "extended addressing" or Maennel, et al. Expires April 30, 2009 [Page 5] Internet-Draft A+P Addressing Extension October 2008 "A+P" (Address Plus Port) addressing. 2.1. Changes Required to the Network The devices involved in this approach are as follows: 1. Customer Premises Equipment (CPE), i.e. cable/DSL modem 2. Customer-Provided-NAT (CN), (optional) 3. Provider Edge Router (PE), AKA customer aggregation router 4. Provider Border Router (BR), provider's edge to other providers 5. Network Core Routers (Core), provider routers not PE or BR 2.1.1. Changes Required to CPE As the customer's hosts should be unaware of the restricted range of ports and the extended A+P addressing scheme, translation would be done at the border between the customer and the provider. In the most common case, this is the provider provisioned cable or DSL modem on the customer's premises into which the customer plugs their single computer or a LAN. This CPE would be aware of the A+P extended addressing. This could be done, for example, via a vendor or other extension to DHCP. The CPE would also provide the A+P NAT function between the customer's LAN and the provider. This would require modification of current CPE. However, current CGN approaches require modifications to the CPE as well, for example [I-D.durand-softwire-dual-stack-lite] says, "It is expected that the home gateway is either software upgradable, replaceable or provided by the service provider as part of a new contract." The customer premises equipment would be configured, hopefully automatically, with o IPv4 and/or IPv6 addressing for the customer's LAN o The IPv4 A+P extended address for the WAN side to connect to the provider, o An IPv6 address for the WAN side to connect to the provider, and o The range of port number to use on the WAN side. Maennel, et al. Expires April 30, 2009 [Page 6] Internet-Draft A+P Addressing Extension October 2008 2.1.2. Changes to Customer-Provided NAT Alternatively, as occasionally happens today, the customer could provide its own A+P NAT and the CPE would then be configured as a simple cable/DSL modem. This customer A+P NAT would be configured with the IPv4 address and port-range allocated to the customer (e.g., via extended DHCP). The customer NAT is entirely optional. The customer does not have to operate such a device. If they do not, then the provider installed CPE handles the mappings. A mixture of CPE and CN device is also possible, where the customer gets full control over the CPE via an administrative login. In this draft, we write CPE/CN to denote the device the customer has control on (i.e., either a CPE with administrative control, or an "A+P-aware CN"). 2.1.3. Changes to Provider-Edge Routers Ultimately, we expect that all CPE/CN's take the functionality of the A+P gateway. Then the provider's customer aggreagation router (aka PE) might only perform some security related functions, i.e., assure that a CPE/CN does not send packets from other ports than the allocated port-range, as the replies in-turn, would go back to some other hosts. This is a comparable threat as IP source address spoofing. During a transition phase, however, customers with legacy CPE could have the A+P gateway-functionality provided by the PE. If we assume only layer 2 devices which connect directly to an interface of the PE, there should be no problems for the customer to be unaware of the restricted port range. Unfortunately, this comes very close to the walled garden effect that a CGN would cause. However, one important difference applies: customers who wish to "escape" from the walled garden can run their own upgraded CN. This way customers become aware of which ports will be A+P NATted and which will not, so they have control over their own applications with no need to interact with the ISP (e.g., there's no need for UPnP equivalents). 2.1.4. Changes to Provider Border Routers Routers at the provider's edge which face other providers need to be aware of the extended A+P IPv4 addresses. They must have the ability to forward packets to the PE based on IPv4 address and port. We suggest that the provider network use IPv6 as the tunneling mechanism. The CPE/CN or PE routers would encapsulate the A+P pseudo Maennel, et al. Expires April 30, 2009 [Page 7] Internet-Draft A+P Addressing Extension October 2008 address within an IPv6 address using a well-known IPv6 prefix. Then the core would route on the IPv6 address. The border routers would recognize the well-known IPv6 prefix, decapsulate the inner IPv4 packet, and normally route on the IPv4 address. Thus the provider's network could be IPv6 only, or any other layer 3/2.5 protocol. 2.1.5. Changes to Network Core Routers If transport through the provider is chosen appropriately, e.g. IPv4-in-IPv6-encapsulation, the network's core routers need not understand A+P extended IPv4 addressing at all. Routing through the core without some form of tunneling would require the deployment of IPv4-A+P all the way to the PE routers. As the original problem was insufficient IPv4 space, we assume that IPv6 or other non-IPv4 tunneling will be used. However, while we recommend IPv6, we acknowledge that A+P is the natural extention of IPv4, and should work seeminglessly. In an IPv4-only (or dual-stacked) network, we propose to host only unsplitted/full IPv4 addresses on the PE. In this case no modifications have to be done to allow routing of /32-or-longer prefixes and forwarding will work with legacy equipment. Only the PE would have to be upgraded to A+P-awareness. 3. Implementation 3.1. A+P dual-stack There's wide consensus that the only long term solution to the IPv4 address shortage is speeding the deployment of IPv6. Hence, we argue that the main design requirement for any short term solution is to ease, or at least not hamper, ISP-wide IPv6 deployment. A+P addressing enables ISPs to run an IPv6-only core with dual-stack devices at the edge. In fact, the A+P CPE/CN and the BR are the only devices that need to support dual-stack. A+P addressing requires those devices to be assigned IPv6 addresses belonging to an ISP-wide well-known prefix (WKP), which only needs to be routable within the ISP. The CPE/CN learns both WKP and its A+P address and port range (e.g., via DHCP), and configures its WAN interface accordingly. Figure 1 shows an example of how WKP and A+P are combined to obtain an IPv6 address at the CPE/CN. Maennel, et al. Expires April 30, 2009 [Page 8] Internet-Draft A+P Addressing Extension October 2008 Configuration (e.g., from DHCP): -------------------------------- WKP = 4999::/64 (64 bits) A = 12.0.0.1 (32 bits) P = ports 4096 to 8191 Port bits usage: -------------------------------- P = Pa + Pp (16 bit port field in TCP header) Pa = address extension (4 bits) Pp = restricted port number (12 bits) from 0001000000000000 (4096) to 0001111111111111 (8191) \__/\__________/ / \ / \ +------------+ +---------------+ | part of A+P| | spare bits for| | address | | port number | | (4 bits) | | (12 bits) | +------------+ +---------------+ IPv6 prefix: -------------------------------- 4999:0:0:0 : 0c00:0001 : 1000 :: /100 \________/ \___________/ \__/ WKP A+P address (64+32+4 bits) Building an IPv6 prefix from Well Known Prefix and A+P address Figure 1 This prefix is announced by the PE in the internal routing of the provider, either IGP or iBGP depending on the provider's routing philosophy. Those prefixes are expected to be highly aggregatable, so that A+P prefixes do not result in large routing tables. It is expected that those prefixes can be announced with very little impact on the routing table size in the ISP core network. Packet delivery works as follows. We first describe how a packet is being transmitted from an A+P-end-user device behind a CPE/CN towards the legacy Internet, and then the opposite direction. In the following examples, we assume that the end-user host is not A+P- aware. Hence, port numbers are A+P NATted at the CPE/CN. The CPE/CN receives an IPv4 packet from the customer to a destination address V4D, ensures that the source port falls into the configured port range, and then encapsulates the packet in an IPv6 packet where the Maennel, et al. Expires April 30, 2009 [Page 9] Internet-Draft A+P Addressing Extension October 2008 source address is WKP+A+P, and the destination address is WKP+V4D. The packet is then routed using standard routing in the ISP core, up to the provider's BR. Note that there is no preconfigured tunnel between the CPE/CN and the BR, and the packet is routed based on the destination address, rather than a predetermined endpoint. When the BR receives the packet, it de-capsulates the IPv4 packet where the source is A and the destination is V4D. Figure 2 exemplifies routing of outgoing packets. Observe that the source port does not initially fall in the configured range (datagram 1), so it is translated at the CPE/CN (datagram 2). Maennel, et al. Expires April 30, 2009 [Page 10] Internet-Draft A+P Addressing Extension October 2008 +-----------+ | Host | +-----+-----+ | |12.0.0.1 (ports 4096 to 8191) IPv4 datagram 1 | | | | v | +---------|---------+ |CPE/CN | | +--------|||--------+ | |||4999:0:0:0:0c00:0001:1000::/100 IPv6 datagram 2| ||| | |||<-IPv4-in-IPv6 | ||| -----|-|||------- / | ||| \ | ISP network | \ | ||| / -----|-|||------- | ||| v ||| +--------|||--------+ |BR ||| | +---------|---------+ | | IPv4 datagram 3 | | -----|--|-------- / | | \ | Internet | \ | | / -----|--|-------- | | v |128.0.0.1 +-----+-----+ | IPv4 Host | +-----------+ Figure 2: Routing of Outgoing Packets Maennel, et al. Expires April 30, 2009 [Page 11] Internet-Draft A+P Addressing Extension October 2008 +-----------------+--------------+-----------------------------+ | Datagram | Header field | Contents | +-----------------+--------------+-----------------------------+ | IPv4 datagram 1 | IPv4 Dst | 128.0.0.1 | | | IPv4 Src | 12.0.0.1 | | | TCP Dst | 80 | | | TCP Src | 32000 | | --------------- | ------------ | --------------------------- | | IPv6 Datagram 2 | IPv6 Dst | 4999:0:0:0:128.0.0.1:: | | | IPv6 Src | 4999:0:0:0:0c00:0001:1001:: | | | IPv4 Dst | 128.0.0.1 | | | IPv4 Src | 12.0.0.1 | | | TCP Dst | 80 | | | TCP Src | 4097 | | --------------- | ------------ | --------------------------- | | IPv4 datagram 3 | IPv4 Dst | 128.0.0.1 | | | IPv4 Src | 12.0.0.1 | | | TCP Dst | 80 | | | TCP Src | 4097 | +-----------------+--------------+-----------------------------+ Datagram header contents An incoming packet undergoes the reverse process. When a BR receives an IPv4 packet on an external interface, it extracts the address and port and then uses that information to build a WKP+A+P IPv6 destination address. The packet is then routed in the ISP core to the user's CPE/CN, which is then able to decapsulate the IPv4 packet where the destination is simply A. Note that the packet processing at the BR is completely stateless, since there's no need to know how many bits of the port are "stolen" by the address. The longest prefix rule will just deliver the packet to the corresponding CPE. All the state is kept the CPE/CN, i.e. at the edge. Figure 3 shows how an incoming packet is routed. Observe that the port translation at the CPE/CN (datagram 3) only happens if the CPE/CN has a preexistent mapping. Otherwise, the port number is left untouched. Overall, this approach brings two major advantages over CGNs: (i) there are no scalability issues, and (ii) it allows a customer to be contacted on the restricted port range with no extra signaling. Maennel, et al. Expires April 30, 2009 [Page 12] Internet-Draft A+P Addressing Extension October 2008 +-----------+ | Host | +-----+-----+ ^ |12.0.0.1 (ports 4096 to 8191) IPv4 datagram 3 | | | | | | +---------|---------+ |CPE/CN | | +--------|||--------+ ^ |||4999:0:0:0:0c00:0001:1000::/100 IPv6 datagram 2| ||| | |||<-IPv4-in-IPv6 | ||| -----|-|||------- / | ||| \ | ISP network | \ | ||| / -----|-|||------- | ||| | ||| +--------|||--------+ |BR ||| | +---------|---------+ ^ | IPv4 datagram 1 | | -----|--|-------- / | | \ | Internet | \ | | / -----|--|-------- | | | |128.0.0.1 +-----+-----+ | IPv4 Host | +-----------+ Figure 3: Routing of Incoming Packets Maennel, et al. Expires April 30, 2009 [Page 13] Internet-Draft A+P Addressing Extension October 2008 +-----------------+--------------+-----------------------------+ | Datagram | Header field | Contents | +-----------------+--------------+-----------------------------+ | IPv4 datagram 1 | IPv4 Dst | 12.0.0.1 | | | IPv4 Src | 128.0.0.1 | | | TCP Dst | 4097 | | | TCP Src | 80 | | --------------- | ------------ | --------------------------- | | IPv6 Datagram 2 | IPv6 Dst | 4999:0:0:0:0c00:0001:1001:: | | | IPv6 Src | 4999:0:0:0:128.0.0.1:: | | | IPv4 Dst | 12.0.0.1 | | | IP Src | 128.0.0.1 | | | TCP Dst | 4097 | | | TCP Src | 80 | | --------------- | ------------ | --------------------------- | | IPv4 datagram 3 | IPv4 Dst | 12.0.0.1 | | | IPv4 Src | 128.0.0.1 | | | TCP Dst | 32000 | | | TCP Src | 80 | +-----------------+--------------+-----------------------------+ Datagram header contents 3.2. Design of the A+P NAT Device There are a number of delicate design choices for the A+P NAT device. We present our preferred solution here. Legacy hosts would send IPv4 packets from any port(s). We are not expecting to change end-hosts; therefore we require some kind of NAT. However, one of our basic assumptions is that the customer wants to be able to run their own servers and NATs. This leads to several constraints: 1) We want to enforce the analog of BCP 38 [BCP38]. This means that no packets outside of the assigned address and port number range should leave the PE for the network. 2) We want minimal configuration. There should be no need for the customer to tell the ISP that they have purchased an A+P- grade home NAT. 3) We must support unmodified computers and NATs. 4) We want the A+P gateway (i.e., CPE) to be as accommodating as possible to strange protocols it knows nothing about. It may do its own packet snooping and/or ALGs for things it knows about (i.e., FTP, SIP, Skype), but should leave it to the CN Maennel, et al. Expires April 30, 2009 [Page 14] Internet-Draft A+P Addressing Extension October 2008 to handle obscure/unknown protocols (e.g., gaming). 5) Conversely, if the customer's CN has done some translation, those packets should not be re-translated. These principles lead us to the following design: 1) The PE should discard any outbound packets that does not originate from the proper A+P address. (Constraint 1) 2) An A+P gateway, (i.e., CPE, CN, or both) should include some option in the DHCP request message, to inform the PE router of its abilities. (Constraint 2) 3) If no A+P signaling was done (i.e., neither CPE nor CN support A+P), the PE router should perform NATting, including whatever ALG functions it can, or an unrestricted IPv4 address has to be provided. (Constraints 3 and 4) 4) The PE router should not modify any A+P packets from the proper address and port range. (Constraints 4 and 5) Note that a customer with no CN or with a non-A+P CN may emit packets within the proper port range by accident, thus accidentally violating part of point 4 above. We solve that by DHCP-based signaling from the A+P gateway: the A+P option in the DHCP request tells the PE that a customer-provided CN will do all NATting according to this design. In that case, the primary function of the PE router is to enforce restrictions on port numbers in outbound packets. We leave unspecified for now the question of how large a port number range is allocated to each customer. We anticipate that the allocation available to a customer will be determined by ISP-specific policy, perhaps as a function of the fee charged to the customer. If variable allocations are to be supported, i.e., the ability for a customer to request more port numbers (and hence more possible simultaneous connections) at one time and fewer at another, the natural way to signal this is in the DHCP A+P request option. However, there is a tradeoff between the advantages of efficiently managing the extended address space via dynamic and/or variable allocation, and the cost it brings in terms of additional complexity. A simple DHCP release/request cycle could be used, but if the proper adjacent block of port numbers was not available, this would entail tearing down existing connection or reNATting them. The disadvantages of the former are obvious; adopting the latter approach would bring back all of the disadvantages this scheme is intended to avoid. One possible answer is to allocate ranges of IPs with a Maennel, et al. Expires April 30, 2009 [Page 15] Internet-Draft A+P Addressing Extension October 2008 static assigned port-range. For example the ISP could offer "classes of service", e.g., the first block of IPs offer 4096 ports, the second class offers 512 ports, the third class offers 16 ports. If the customer wants more ports, the address needs to be moved into a different class. Obviously, this does not go without a service interruption for this particular customer (i.e., the customer has to get a new IP). However, this solves the problem of dynamic allocation for the ISP. We leave details of this issue for future work. 3.3. IPv6 and mixed V4-V6 traffic Note that if IPv4/IPv6 dual stack is provided on the customer's LAN, IPv6 to IPv6 destinations would be be transported untranslated from the customer's host to the provider's border with other providers. If the customer has an IPv6-only LAN, then the device providing A+P translation should also provide NAT-PT service so that the customer could communicate with the IPv4 Internet. 3.4. Handling ICMP ICMP is problematic for all NATs, because it lacks port numbers. A+P routing exacerbates the problem. Most ICMP messages fall into one of two categories: error reports, or ECHO/ECHO reply (commonly known as "ping"). For error reports, the offending packet header is embedded within the ICMP packet; NATs can then rewrite that portion and route the packet to the actual destination host. This functionality will remain the same with A+P; however, the provider's BR will need to examine the embedded header to learn with A+P NAT is handling it, while that box will do the necessary rewriting. ECHO and ECHO reply are more problematic. For ECHO, the border router must rewrite the "Identifier" and perhaps "Sequence Number" fields in the ICMP request, so that returning ECHO REPLY packets may be routed correctly. We suggest to rewrite the information in the sequence number to allow the BR returning ECHO replies to come back to the appropriate host. 3.5. Handling IP fragments Much like ICMP packets, IP fragmented packets are renowned to be hard to handle in any address translation mechanism [RFC3022]. In fact, only the first IP fragment contains the TCP (UDP) header. This issue is commonly dealt with by keeping additional state at the NAT device which allows fragments to be mapped to the correct TCP (UDP) session. Maennel, et al. Expires April 30, 2009 [Page 16] Internet-Draft A+P Addressing Extension October 2008 In the A+P NAT solution, fragments coming from the internal domain can be avoided if the core network runs IPv6 only and the PE ensures that no layer-3 fragmentation is performed by the customer equipment. Fragments coming from the external domain are harder to handle. Commercial NATs extract the port number out of the first fragment and keep that information to map subsequent fragments. Moreover, when the first fragment is not the first one to be received at the NAT, the fragment needs to be stored until the port number is known [CCIE-Pro]. Note that a deployment scenario which intends to handle fragments must ensure that all of the fragments arrive at the same fragment handling host. We propose to route fragments to special boxes by exploiting the prefix combination in a similar way to Figure 1. The BR is able to detect that a packet is fragmented when it receives it, so in that case it uses a different well-known prefix which is intended for fragments only (we call it WKPF). Hence, the BR builds an IPv6 packet where the destination address is WKPF+A and then uses normal routing. Fragments are then routed to a special box which we call "fragment handler" (FH). The FH is in charge of keeping track of the port numbers used by each fragment. Namely, upon receiving the first fragment, the FH stores a mapping --> (8 bytes in total), which it uses to build the correct WKP+A+P address for all the fragments of the same IP packet (identified by the pair ). After storing such a mapping, all subsequent fragments can be forwarded to the correct A+P destination address. This way, fragment storage is only required for out-of-order fragments, until the fragment carrying the port number is received. Since out-of-order packets are pretty rare, the FH is not expected to buffer an high number of fragments. Observe that a CGN also needs to remember the dst_ip information, since it cannot trust the dst_ip in the packet itself. In this case, each entry in the mapping takes 12 bytes instead of 8. Finally, handling fragments via a specific prefix gives the network operator the flexibility to deploy multiple FHs. There are two limit cases: on one hand, a single FHs that handles all the fragments in the network (the FH then announces WKPF); on the other hand, a FH for each destination IP (the FH then announces WKPF+A). Again, the longest prefix matching rule gives the ISP the autonomy to choose any intermediate point in between. 3.6. The incremental path to A+P In this section we will discuss one possibility for large networks to incrementally deploy A+P. As discussed above, the A+P scheme requires changes to the CPE, the BR, and (optionally) the PE. Changes to the routing system include the addition of the WKP and WKPF. The upgrade Maennel, et al. Expires April 30, 2009 [Page 17] Internet-Draft A+P Addressing Extension October 2008 of the BR, as well as routing the WKP/WKPF have to be done before the first customers transition to A+P. In addition, it is possible to provide the A+P NAT function at the PE routers while gradually upgrading the CPEs. (We stress here once again, that as soon as the PE is upgraded and A+P is activated the customer must be able to operate its own CN, if he/she so desires.) One important consideration has not been made so far: the BR mentioned in this document is essentially the BR of the A+P part of the network, and does not necessarily have to be the border router of the ISP. In this sense it might be possible to upgrade a smaller, but contiguous part of a larger network, as long as it supports dual-stack. However, care needs to be taken that all routers (BR) that might form the boundary of the "upgraded cloud", are upgraded to A+P. In this case, those routers translate "A+P packets" into "legacy IPv4 packets" and vice versa. A+P clouds can be independently deployed within the ISP network: the only constraint that needs to be satisfied is that the A+P address space does not overlap with the IPv4 address space which still serves legacy CPEs. As the A+P deployment speeds up, small clouds can be easily merged into bigger ones, leading the way to the ultimate goal of a single, ISP-wide A+P cloud. For instance, a deployment plan could be to install A+P clouds at some neighboring PoPs, then merge them at the state level, and so on. 4. Benefits and limitations of A+P A+P addressing leverages internal routing in the ISP to route packets on extended addresses in a stateless manner. This allows customers to be assigned globally routable addresses and to accept incoming connections on their A+P port range. Observe that the statefulness of NATs hampers this desirable feature, and forces users to use out- of-band signaling (e.g., UPnP). From the perspective of the ISP, on the other hand, A+P statelessness usually means lower deployment costs and less scalability issues with respect to stateful approaches like NAT. Moreover, A+P allows ISP to fine-tune their network via standard internal routing management, without adding an extra layer of complexity (e.g., point-to-point tunnels). We now discuss the limitations of the A+P approach. Recall that a transport session is identified by a 5-tuple Hence, any mechanism that shares the same IP address among multiple hosts intrinsically poses limitations on the number of active transport sessions that a single host can maintain. Observe that Maennel, et al. Expires April 30, 2009 [Page 18] Internet-Draft A+P Addressing Extension October 2008 connections with different hosts (or even different applications on the same host) are only minimally impacted, because they can be differentiated by means of the dst_ip (dst_port) field. Therefore, the only case in which address sharing causes troubles is multiple outbound transport sessions with the same remote host and the same port. In fact, in this case only the src_port field can be used to differentiate, however that field can not be fully exploited, since it is also used to multiplex multiple users on the same IP address. While multiple sessions with the same remote application are not a widely spread practice, some very popular websites (e.g., GoogleMaps and iTunes) have been reported to massively use multiple TCP/IP connections to maximize parallelism. The current estimate of the number of parallel sessions used by those websites is circa 70 [I-D.durand-softwire-dual-stack-lite]. In this respect, A+P with 8 port bits would allow every host to maintain up to 256 parallel connections with the same remote process, while still providing 256 times more addresses for end hosts. Another limitation that A+P shares with any other IP address sharing mechanism is the availability of well known ports. In fact, services run by customers that share the same IP address will be distinguished by the port number. As a consequence, it will be impossible for two customers who share the same IP address to run services on the same port (e.g., port 80). Unfortunately, working around this limitation implies application-specific hacks (e.g., HTTP and HTTPS virtual hosting), whose discussion is out of the scope of this document. Observe that some popular applications (e.g., BitTorrent) require the availability of well known ports. However, those applications can easily adapt to work with different ports, and users of such tools update them frequently (e.g., to exploit new features). 5. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC. 6. Security Considerations 7. Acknowledgements The authors wish to thank David Ward for review, endless constructive criticism, and interminable questions, and Cullen Jennings for discussion and review of fragmentation. We also like to thank the Maennel, et al. Expires April 30, 2009 [Page 19] Internet-Draft A+P Addressing Extension October 2008 following persons for their valuable feedback on earlier versions of this work: Bernhard Ager, Alain Durand, Dino Farinacci, Hamed Haddadi, Russ Housley, Wolfgang Muehlbauer and Ruediger Volk. 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 8.2. Informative References [BCP38] Ferguson, P. and D. Senie, "Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing", BCP 38, May 2000. [CCIE-Pro] Doyle, J., "Routing TCP/IP Volume I (CCIE Professional Development)", 1998. [I-D.durand-softwire-dual-stack-lite] Durand, A., Droms, R., Haberman, B., and J. Woodyatt, "Dual-stack lite broadband deployments post IPv4 exhaustion", draft-durand-softwire-dual-stack-lite-00 (work in progress), September 2008. [Martin-Java] Martin, D., Rajagopalan, S., and A. Rubin, "Blocking Java Applets at the Firewall", Proceedings of the Internet Society Symposium on Network and Distributed System Security, pp. 16-26, 1997. [RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol", STD 9, RFC 959, October 1985. [RFC2766] Tsirtsis, G. and P. Srisuresh, "Network Address Translation - Protocol Translation (NAT-PT)", RFC 2766, February 2000. [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [RFC4966] Aoun, C. and E. Davies, "Reasons to Move the Network Address Translator - Protocol Translator (NAT-PT) to Historic Status", RFC 4966, July 2007. Maennel, et al. Expires April 30, 2009 [Page 20] Internet-Draft A+P Addressing Extension October 2008 [SP-NAT] Alcock, S., Nelson, R., and D. Miles, "Characterizing the Network Connection Behavior of Residential Broadband Subscribers", draft, under-submission , 2009. Authors' Addresses Olaf Maennel T-Labs/TU-Berlin Ernst-Reuter-Platz 7 Berlin 10587 Germany Phone: +491607199931 Email: olaf@maennel.net Randy Bush Internet Initiative Japan 5147 Crystal Springs Bainbridge Island, Washington 98110 US Phone: +1 206 780 0431 x1 Email: randy@psg.com Luca Cittadini Universita' Roma Tre via della Vasca Navale, 79 Rome, 00146 Italy Phone: +39 06 5733 3215 Email: luca.cittadini@gmail.com Steven M. Bellovin Columbia University 1214 Amsterdam Avenue MC 0401 New York, NY 10027 US Phone: +1 212 939 7149 Email: bellovin@acm.org Maennel, et al. Expires April 30, 2009 [Page 21] Internet-Draft A+P Addressing Extension October 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Maennel, et al. Expires April 30, 2009 [Page 22]