Network Working Group                                         O. Maennel
Internet-Draft                                          T-Labs/TU-Berlin
Intended status: Standards Track                                 R. Bush
Expires: April 30, 2009                        Internet Initiative Japan
                                                            L. Cittadini
                                                    Universita' Roma Tre
                                                             S. Bellovin
                                                     Columbia University
                                                        October 27, 2008


    The A+P Approach to the Broadband Provider IPv4 Address Shortage
                          draft-ymbk-aplusp-00

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.
   This document may not be modified, and derivative works of it may not
   be created.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 30, 2009.

Abstract

   We are facing the exhaustion of the IANA IPv4 free IP address pool.
   Unfortunately, IPv6 is not yet deployed widely enough to fully
   replace IPv4, and it is unrealistic to expect that this is going to
   change before we run out of IPv4 addresses.  Letting hosts seamlessly
   communicate in an IPv4-world without assigning a unique globally


Maennel, et al.          Expires April 30, 2009                 [Page 1]

Internet-Draft          A+P Addressing Extension            October 2008


   routable IPv4 address to each of them is a challenging problem, for
   which many solutions have been proposed.  Some prominent ones involve
   carrier-grade-NATs (CGN), which have been shown to provide an
   inadequate experience to IPv4 users and enshrine a walled garden in
   the core of the provider.  Instead, we propose using specialized NATs
   at the consumer premises equipment (CPE) edge which treat some of the
   port number bits as part of an extended IPv4 address.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Why Carrier-Grade-NATs are Harmful . . . . . . . . . . . .  3
     1.2.  Security of CGNs . . . . . . . . . . . . . . . . . . . . .  5
   2.  Proposed Solution  . . . . . . . . . . . . . . . . . . . . . .  5
     2.1.  Changes Required to the Network  . . . . . . . . . . . . .  6
       2.1.1.  Changes Required to CPE  . . . . . . . . . . . . . . .  6
       2.1.2.  Changes to Customer-Provided NAT . . . . . . . . . . .  7
       2.1.3.  Changes to Provider-Edge Routers . . . . . . . . . . .  7
       2.1.4.  Changes to Provider Border Routers . . . . . . . . . .  7
       2.1.5.  Changes to Network Core Routers  . . . . . . . . . . .  8
   3.  Implementation . . . . . . . . . . . . . . . . . . . . . . . .  8
     3.1.  A+P dual-stack . . . . . . . . . . . . . . . . . . . . . .  8
     3.2.  Design of the A+P NAT Device . . . . . . . . . . . . . . . 14
     3.3.  IPv6 and mixed V4-V6 traffic . . . . . . . . . . . . . . . 16
     3.4.  Handling ICMP  . . . . . . . . . . . . . . . . . . . . . . 16
     3.5.  Handling IP fragments  . . . . . . . . . . . . . . . . . . 16
     3.6.  The incremental path to A+P  . . . . . . . . . . . . . . . 17
   4.  Benefits and limitations of A+P  . . . . . . . . . . . . . . . 18
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 20
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
   Intellectual Property and Copyright Statements . . . . . . . . . . 22


Maennel, et al.          Expires April 30, 2009                 [Page 2]

Internet-Draft          A+P Addressing Extension            October 2008


1.  Introduction

   Many large Internet Service Providers (ISPs) face the problem, that
   their networks' customer edges are so large that, even giving the
   'front' of each customer premises equipment (CPE) only one single
   IPv4 address, they need two to five /8s of IPv4 space.  The looming
   exhaustion of the free IANA IPv4 pool makes it highly unlikely that
   they would be allocated that much public IPv4 address space.
   Therefore ISPs have to devise something more ingenious.  Deploying
   NATs is a direct consequence of the design of a new protocol (IPv6)
   which is incompatible on the wire, there is not the slightest
   compatibility mode.  Although undesirable, NATs are inevitable.

   An approach which some broadband providers are testing is being
   called Carrier Grade NAT (CGN).  It is essentially a number of IPv4
   NATs in the core of their networks and various tunneling and
   translation techniques.  If the CPE has dual stack, traffic where
   source and destination is IPv6 would not have to be NATted, but IPv4
   would be heavily NATted.  We can contrast this to, for example,
   NAT-PT [RFC2766] [RFC4966] on the CPE, which would probably scale to
   the needs of even a large non-consumer backbone.  But, as we noted
   above, very large broadband consumer providers would need far too
   much IPv4 space for the NAT-PT front ends for their large consumer
   networks.

   Our main concern is that the imminent IPv4 address exhaustion is
   tempting operators to deploy technology which is damaging to the
   Internet as a whole.

1.1.  Why Carrier-Grade-NATs are Harmful

   We have taken up a desperate search for alternatives.  The reasons
   are simple:

   "Carrier grade" is a euphemism for centralized.  More semantics move
   to the core of the network.  This is bad in and of itself.  Net-heads
   call it "telco-think" because it is the telco model of smarts in the
   core as opposed to the Internet model of a simple, just forward
   packets, core and smart edges.  It also places the provider in the
   position of a walled garden, where the user is trapped behind
   unchangeable application and policies, the opposite of the "end to
   end" model of the Internet.

   With the smarts at the edges, e.g.  NAT-PT, one can easily field new
   protocols between consenting end-points by just tweaking the NATs at
   the corresponding CPE, even adding application layer gateways (ALGs)
   if they are needed.  However, CGNs do not build an Internet walled
   garden at the edges, they build it by restricting the core.


Maennel, et al.          Expires April 30, 2009                 [Page 3]

Internet-Draft          A+P Addressing Extension            October 2008


   With NAT in the core, if a customer wants a new application protocol
   which requires cooperation from the NAT, he gets to beg help from the
   broadband providers' engineers and lawyers, and all other users of
   carrier grade NATs.  This is the ultimate horror the NAT-haters fear,
   and, in this case, they are not all that wrong.

   One broadband provider has recently received a lot of bad press for
   just this, though we know that the engineers are very far from those
   responsible.  This shows that all new application protocols have to
   go through the carrier loving lawyers to be allowed to be handled by
   the NATs in their core.  Today's NATs are typically mitigated by
   ALG's of which the customer has some degree of control, e.g. port
   forwarding or UPnP.  However, this is not expected to work anymore
   with CGN's.  CGN proposals admit that it is not expected that
   applications that require specific port assignment or port mapping
   from the NAT box will keep working
   [I-D.durand-softwire-dual-stack-lite].  We believe this is not an
   option and that the end-user must have the ability to control their
   own ALGs.  So, if someone wants to deploy a new application, they can
   talk to the broadband providers' lawyers or run new disruptive
   technology over HTTP, we pick our poison.  And if the NAT is not
   where the customer can directly control it, i.e. it is anywhere back
   in the provider's network, then the provider controls what the user
   can control, i.e. it is not really under user control.  We do not
   wish to deal with the case where the provider has to decide whether
   to allow Skype v42 when they themselves provide a competing VoIP
   product.

   And remember that, as IPv6 deploys, and we want to have one Internet,
   i.e.  IPv4 nodes talking freely with IPv6 nodes, then translation
   must be done somewhere.  The challenge is whether someone can figure
   out a scheme where it is done for these large networks?  We believe
   it should be at the customer edge, not in the core.

   Another issue with CGNs is scalability.  ISPs face a tension in
   between the placement of CGNs within their network to aggregate as
   much as possible and that too much aggregation creates a massive
   state problem.  To reduce the state, the placement ends up somewhere
   closer to the edge, where the benefits are somewhat limited.

   It is not clear how a CGN should maintain per-session state in a
   scalable manner.  This is particularly relevant given that each
   customer is very likely to open many TCP connections in parallel.
   State for improperly terminated sessions could remain stale for some
   time.  The CGN hence trades scalability for the amount of state that
   needs to be kept, and this makes optimally placing a CGN a hard
   engineering problem.


Maennel, et al.          Expires April 30, 2009                 [Page 4]

Internet-Draft          A+P Addressing Extension            October 2008


   With CGNs, tracing hackers, spammers and other criminals will be
   impossible, unless all the connection based mapping information is
   recorded and stored.  This would cause not only concern for law
   enforcement services, but also for privacy advocates.  Which brings
   us to the other security related problems with CGNs in the next
   section.

1.2.  Security of CGNs

   NATs frequently need to initiate translation for secondary port
   numbers.  This may be a decision based on packet inspection (i.e.,
   looking for PORT commands in FTP [RFC0959] sessions), or it may rely
   on explicit signaling from the end host via protocols such as UPnP.
   Either way, CGNs pose a security threat and/or an administrative
   nightmare.

   The issue is proper authentication of such requests.  Most UPnP
   devices do not implement appropriate security features.  Even if they
   did, there would be no way to administer the security mechanism.
   Every end-user device would have to have a secret corresponding to
   some authentication field in the CGN.  End users will not set these
   up properly; providers do not want to maintain such a database.

   Decisions made based on packet inspection are just as problematic.  A
   request from one customer could easily request opening a port for an
   other customer's addresses, similar to the Java-based attack
   described by Martin et al in [Martin-Java].


2.  Proposed Solution

   The specific problem we are facing is that available IPv4 address
   space is insufficient to number the IPv4-speaking customers, while
   IPv6 is not widely enough deployed to migrate to an IPv6-only world.
   Therefore, we propose to extend the IPv4 address space by assigning
   to each customer a single IPv4 address which is extended by
   "stealing" bits from the port number in the TCP/UDP header, leaving
   the applications a reduced range of ports.  In the face of IPv4
   address exhaustion, the need for addresses is stronger than the need
   to be able to address thousands of applications on a single host
   [SP-NAT], and broadband consumers are not anticipated to deploy a
   massive number of applications over IPv4 (if they did, CGN would be
   even more damaging than this "bit-stealing" proposal).  Assuming we
   could limit the applications' port addressing to 8 (or 12) bits, we
   can increase the effective size of an IPv4 address by 8 (or 4)
   additional bits.  In this scenario, 512 (or 16) customers could be
   multiplexed on the same IPv4 address, while allowing them a fixed
   range of 512 (or 4096) ports.  We call this "extended addressing" or


Maennel, et al.          Expires April 30, 2009                 [Page 5]

Internet-Draft          A+P Addressing Extension            October 2008


   "A+P" (Address Plus Port) addressing.

2.1.  Changes Required to the Network

   The devices involved in this approach are as follows:

   1.  Customer Premises Equipment (CPE), i.e. cable/DSL modem

   2.  Customer-Provided-NAT (CN), (optional)

   3.  Provider Edge Router (PE), AKA customer aggregation router

   4.  Provider Border Router (BR), provider's edge to other providers

   5.  Network Core Routers (Core), provider routers not PE or BR

2.1.1.  Changes Required to CPE

   As the customer's hosts should be unaware of the restricted range of
   ports and the extended A+P addressing scheme, translation would be
   done at the border between the customer and the provider.  In the
   most common case, this is the provider provisioned cable or DSL modem
   on the customer's premises into which the customer plugs their single
   computer or a LAN.  This CPE would be aware of the A+P extended
   addressing.  This could be done, for example, via a vendor or other
   extension to DHCP.  The CPE would also provide the A+P NAT function
   between the customer's LAN and the provider.

   This would require modification of current CPE.  However, current CGN
   approaches require modifications to the CPE as well, for example
   [I-D.durand-softwire-dual-stack-lite] says, "It is expected that the
   home gateway is either software upgradable, replaceable or provided
   by the service provider as part of a new contract."

   The customer premises equipment would be configured, hopefully
   automatically, with

   o  IPv4 and/or IPv6 addressing for the customer's LAN

   o  The IPv4 A+P extended address for the WAN side to connect to the
      provider,

   o  An IPv6 address for the WAN side to connect to the provider, and

   o  The range of port number to use on the WAN side.


Maennel, et al.          Expires April 30, 2009                 [Page 6]

Internet-Draft          A+P Addressing Extension            October 2008


2.1.2.  Changes to Customer-Provided NAT

   Alternatively, as occasionally happens today, the customer could
   provide its own A+P NAT and the CPE would then be configured as a
   simple cable/DSL modem.  This customer A+P NAT would be configured
   with the IPv4 address and port-range allocated to the customer (e.g.,
   via extended DHCP).

   The customer NAT is entirely optional.  The customer does not have to
   operate such a device.  If they do not, then the provider installed
   CPE handles the mappings.  A mixture of CPE and CN device is also
   possible, where the customer gets full control over the CPE via an
   administrative login.  In this draft, we write CPE/CN to denote the
   device the customer has control on (i.e., either a CPE with
   administrative control, or an "A+P-aware CN").

2.1.3.  Changes to Provider-Edge Routers

   Ultimately, we expect that all CPE/CN's take the functionality of the
   A+P gateway.  Then the provider's customer aggreagation router (aka
   PE) might only perform some security related functions, i.e., assure
   that a CPE/CN does not send packets from other ports than the
   allocated port-range, as the replies in-turn, would go back to some
   other hosts.  This is a comparable threat as IP source address
   spoofing.

   During a transition phase, however, customers with legacy CPE could
   have the A+P gateway-functionality provided by the PE.  If we assume
   only layer 2 devices which connect directly to an interface of the
   PE, there should be no problems for the customer to be unaware of the
   restricted port range.  Unfortunately, this comes very close to the
   walled garden effect that a CGN would cause.

   However, one important difference applies: customers who wish to
   "escape" from the walled garden can run their own upgraded CN.  This
   way customers become aware of which ports will be A+P NATted and
   which will not, so they have control over their own applications with
   no need to interact with the ISP (e.g., there's no need for UPnP
   equivalents).

2.1.4.  Changes to Provider Border Routers

   Routers at the provider's edge which face other providers need to be
   aware of the extended A+P IPv4 addresses.  They must have the ability
   to forward packets to the PE based on IPv4 address and port.

   We suggest that the provider network use IPv6 as the tunneling
   mechanism.  The CPE/CN or PE routers would encapsulate the A+P pseudo


Maennel, et al.          Expires April 30, 2009                 [Page 7]

Internet-Draft          A+P Addressing Extension            October 2008


   address within an IPv6 address using a well-known IPv6 prefix.  Then
   the core would route on the IPv6 address.  The border routers would
   recognize the well-known IPv6 prefix, decapsulate the inner IPv4
   packet, and normally route on the IPv4 address.  Thus the provider's
   network could be IPv6 only, or any other layer 3/2.5 protocol.

2.1.5.  Changes to Network Core Routers

   If transport through the provider is chosen appropriately, e.g.
   IPv4-in-IPv6-encapsulation, the network's core routers need not
   understand A+P extended IPv4 addressing at all.  Routing through the
   core without some form of tunneling would require the deployment of
   IPv4-A+P all the way to the PE routers.  As the original problem was
   insufficient IPv4 space, we assume that IPv6 or other non-IPv4
   tunneling will be used.

   However, while we recommend IPv6, we acknowledge that A+P is the
   natural extention of IPv4, and should work seeminglessly.  In an
   IPv4-only (or dual-stacked) network, we propose to host only
   unsplitted/full IPv4 addresses on the PE.  In this case no
   modifications have to be done to allow routing of /32-or-longer
   prefixes and forwarding will work with legacy equipment.  Only the PE
   would have to be upgraded to A+P-awareness.


3.  Implementation

3.1.  A+P dual-stack

   There's wide consensus that the only long term solution to the IPv4
   address shortage is speeding the deployment of IPv6.  Hence, we argue
   that the main design requirement for any short term solution is to
   ease, or at least not hamper, ISP-wide IPv6 deployment.  A+P
   addressing enables ISPs to run an IPv6-only core with dual-stack
   devices at the edge.  In fact, the A+P CPE/CN and the BR are the only
   devices that need to support dual-stack.  A+P addressing requires
   those devices to be assigned IPv6 addresses belonging to an ISP-wide
   well-known prefix (WKP), which only needs to be routable within the
   ISP.  The CPE/CN learns both WKP and its A+P address and port range
   (e.g., via DHCP), and configures its WAN interface accordingly.
   Figure 1 shows an example of how WKP and A+P are combined to obtain
   an IPv6 address at the CPE/CN.


Maennel, et al.          Expires April 30, 2009                 [Page 8]

Internet-Draft          A+P Addressing Extension            October 2008


         Configuration (e.g., from DHCP):
         --------------------------------
         WKP = 4999::/64  (64 bits)
         A = 12.0.0.1     (32 bits)
         P = ports 4096 to 8191

         Port bits usage:
         --------------------------------
         P = Pa + Pp                 (16 bit port field in TCP header)
         Pa = address extension      (4 bits)
         Pp = restricted port number (12 bits)

             from 0001000000000000 (4096) to 0001111111111111 (8191)
                  \__/\__________/
                   /        \
                  /          \
           +------------+  +---------------+
           | part of A+P|  | spare bits for|
           |  address   |  |  port number  |
           |  (4 bits)  |  |   (12 bits)   |
           +------------+  +---------------+

         IPv6 prefix:
         --------------------------------
         4999:0:0:0   : 0c00:0001 : 1000 ::  /100
         \________/     \___________/        \__/
            WKP          A+P address     (64+32+4 bits)


      Building an IPv6 prefix from Well Known Prefix and A+P address

                                 Figure 1

   This prefix is announced by the PE in the internal routing of the
   provider, either IGP or iBGP depending on the provider's routing
   philosophy.  Those prefixes are expected to be highly aggregatable,
   so that A+P prefixes do not result in large routing tables.  It is
   expected that those prefixes can be announced with very little impact
   on the routing table size in the ISP core network.

   Packet delivery works as follows.  We first describe how a packet is
   being transmitted from an A+P-end-user device behind a CPE/CN towards
   the legacy Internet, and then the opposite direction.  In the
   following examples, we assume that the end-user host is not A+P-
   aware.  Hence, port numbers are A+P NATted at the CPE/CN.  The CPE/CN
   receives an IPv4 packet from the customer to a destination address
   V4D, ensures that the source port falls into the configured port
   range, and then encapsulates the packet in an IPv6 packet where the


Maennel, et al.          Expires April 30, 2009                 [Page 9]

Internet-Draft          A+P Addressing Extension            October 2008


   source address is WKP+A+P, and the destination address is WKP+V4D.

   The packet is then routed using standard routing in the ISP core, up
   to the provider's BR.  Note that there is no preconfigured tunnel
   between the CPE/CN and the BR, and the packet is routed based on the
   destination address, rather than a predetermined endpoint.  When the
   BR receives the packet, it de-capsulates the IPv4 packet where the
   source is A and the destination is V4D. Figure 2 exemplifies routing
   of outgoing packets.  Observe that the source port does not initially
   fall in the configured range (datagram 1), so it is translated at the
   CPE/CN (datagram 2).


Maennel, et al.          Expires April 30, 2009                [Page 10]

Internet-Draft          A+P Addressing Extension            October 2008


                   +-----------+
                   |    Host   |
                   +-----+-----+
                      |  |12.0.0.1 (ports 4096 to 8191)
      IPv4 datagram 1 |  |
                      |  |
                      v  |
               +---------|---------+
               |CPE/CN   |         |
               +--------|||--------+
                      | |||4999:0:0:0:0c00:0001:1000::/100
       IPv6 datagram 2| |||
                      | |||<-IPv4-in-IPv6
                      | |||
                 -----|-|||-------
               /      | |||        \
              |     ISP network     |
               \      | |||        /
                 -----|-|||-------
                      | |||
                      v |||
               +--------|||--------+
               |BR      |||        |
               +---------|---------+
                      |  |
      IPv4 datagram 3 |  |
                 -----|--|--------
               /      |  |         \
              |     Internet        |
               \      |  |         /
                 -----|--|--------
                      |  |
                      v  |128.0.0.1
                   +-----+-----+
                   | IPv4 Host |
                   +-----------+

                   Figure 2: Routing of Outgoing Packets


Maennel, et al.          Expires April 30, 2009                [Page 11]

Internet-Draft          A+P Addressing Extension            October 2008


     +-----------------+--------------+-----------------------------+
     |        Datagram | Header field | Contents                    |
     +-----------------+--------------+-----------------------------+
     | IPv4 datagram 1 |     IPv4 Dst | 128.0.0.1                   |
     |                 |     IPv4 Src | 12.0.0.1                    |
     |                 |      TCP Dst | 80                          |
     |                 |      TCP Src | 32000                       |
     | --------------- | ------------ | --------------------------- |
     | IPv6 Datagram 2 |     IPv6 Dst | 4999:0:0:0:128.0.0.1::      |
     |                 |     IPv6 Src | 4999:0:0:0:0c00:0001:1001:: |
     |                 |     IPv4 Dst | 128.0.0.1                   |
     |                 |     IPv4 Src | 12.0.0.1                    |
     |                 |      TCP Dst | 80                          |
     |                 |      TCP Src | 4097                        |
     | --------------- | ------------ | --------------------------- |
     | IPv4 datagram 3 |     IPv4 Dst | 128.0.0.1                   |
     |                 |     IPv4 Src | 12.0.0.1                    |
     |                 |      TCP Dst | 80                          |
     |                 |      TCP Src | 4097                        |
     +-----------------+--------------+-----------------------------+

                         Datagram header contents

   An incoming packet undergoes the reverse process.  When a BR receives
   an IPv4 packet on an external interface, it extracts the address and
   port and then uses that information to build a WKP+A+P IPv6
   destination address.  The packet is then routed in the ISP core to
   the user's CPE/CN, which is then able to decapsulate the IPv4 packet
   where the destination is simply A. Note that the packet processing at
   the BR is completely stateless, since there's no need to know how
   many bits of the port are "stolen" by the address.  The longest
   prefix rule will just deliver the packet to the corresponding CPE.
   All the state is kept the CPE/CN, i.e. at the edge.  Figure 3 shows
   how an incoming packet is routed.  Observe that the port translation
   at the CPE/CN (datagram 3) only happens if the CPE/CN has a
   preexistent mapping.  Otherwise, the port number is left untouched.
   Overall, this approach brings two major advantages over CGNs: (i)
   there are no scalability issues, and (ii) it allows a customer to be
   contacted on the restricted port range with no extra signaling.


Maennel, et al.          Expires April 30, 2009                [Page 12]

Internet-Draft          A+P Addressing Extension            October 2008


                   +-----------+
                   |    Host   |
                   +-----+-----+
                      ^  |12.0.0.1 (ports 4096 to 8191)
      IPv4 datagram 3 |  |
                      |  |
                      |  |
               +---------|---------+
               |CPE/CN   |         |
               +--------|||--------+
                      ^ |||4999:0:0:0:0c00:0001:1000::/100
       IPv6 datagram 2| |||
                      | |||<-IPv4-in-IPv6
                      | |||
                 -----|-|||-------
               /      | |||        \
              |     ISP network     |
               \      | |||        /
                 -----|-|||-------
                      | |||
                      | |||
               +--------|||--------+
               |BR      |||        |
               +---------|---------+
                      ^  |
      IPv4 datagram 1 |  |
                 -----|--|--------
               /      |  |         \
              |     Internet        |
               \      |  |         /
                 -----|--|--------
                      |  |
                      |  |128.0.0.1
                   +-----+-----+
                   | IPv4 Host |
                   +-----------+

                   Figure 3: Routing of Incoming Packets


Maennel, et al.          Expires April 30, 2009                [Page 13]

Internet-Draft          A+P Addressing Extension            October 2008


     +-----------------+--------------+-----------------------------+
     |        Datagram | Header field | Contents                    |
     +-----------------+--------------+-----------------------------+
     | IPv4 datagram 1 |     IPv4 Dst | 12.0.0.1                    |
     |                 |     IPv4 Src | 128.0.0.1                   |
     |                 |      TCP Dst | 4097                        |
     |                 |      TCP Src | 80                          |
     | --------------- | ------------ | --------------------------- |
     | IPv6 Datagram 2 |     IPv6 Dst | 4999:0:0:0:0c00:0001:1001:: |
     |                 |     IPv6 Src | 4999:0:0:0:128.0.0.1::      |
     |                 |     IPv4 Dst | 12.0.0.1                    |
     |                 |       IP Src | 128.0.0.1                   |
     |                 |      TCP Dst | 4097                        |
     |                 |      TCP Src | 80                          |
     | --------------- | ------------ | --------------------------- |
     | IPv4 datagram 3 |     IPv4 Dst | 12.0.0.1                    |
     |                 |     IPv4 Src | 128.0.0.1                   |
     |                 |      TCP Dst | 32000                       |
     |                 |      TCP Src | 80                          |
     +-----------------+--------------+-----------------------------+

                         Datagram header contents

3.2.  Design of the A+P NAT Device

   There are a number of delicate design choices for the A+P NAT device.
   We present our preferred solution here.

   Legacy hosts would send IPv4 packets from any port(s).  We are not
   expecting to change end-hosts; therefore we require some kind of NAT.
   However, one of our basic assumptions is that the customer wants to
   be able to run their own servers and NATs.  This leads to several
   constraints:

   1)      We want to enforce the analog of BCP 38 [BCP38].  This means
           that no packets outside of the assigned address and port
           number range should leave the PE for the network.

   2)      We want minimal configuration.  There should be no need for
           the customer to tell the ISP that they have purchased an A+P-
           grade home NAT.

   3)      We must support unmodified computers and NATs.

   4)      We want the A+P gateway (i.e., CPE) to be as accommodating as
           possible to strange protocols it knows nothing about.  It may
           do its own packet snooping and/or ALGs for things it knows
           about (i.e., FTP, SIP, Skype), but should leave it to the CN


Maennel, et al.          Expires April 30, 2009                [Page 14]

Internet-Draft          A+P Addressing Extension            October 2008


           to handle obscure/unknown protocols (e.g., gaming).

   5)      Conversely, if the customer's CN has done some translation,
           those packets should not be re-translated.

   These principles lead us to the following design:

   1)      The PE should discard any outbound packets that does not
           originate from the proper A+P address.  (Constraint 1)

   2)      An A+P gateway, (i.e., CPE, CN, or both) should include some
           option in the DHCP request message, to inform the PE router
           of its abilities.  (Constraint 2)

   3)      If no A+P signaling was done (i.e., neither CPE nor CN
           support A+P), the PE router should perform NATting, including
           whatever ALG functions it can, or an unrestricted IPv4
           address has to be provided.  (Constraints 3 and 4)

   4)      The PE router should not modify any A+P packets from the
           proper address and port range.  (Constraints 4 and 5)

   Note that a customer with no CN or with a non-A+P CN may emit packets
   within the proper port range by accident, thus accidentally violating
   part of point 4 above.  We solve that by DHCP-based signaling from
   the A+P gateway: the A+P option in the DHCP request tells the PE that
   a customer-provided CN will do all NATting according to this design.
   In that case, the primary function of the PE router is to enforce
   restrictions on port numbers in outbound packets.

   We leave unspecified for now the question of how large a port number
   range is allocated to each customer.  We anticipate that the
   allocation available to a customer will be determined by ISP-specific
   policy, perhaps as a function of the fee charged to the customer.  If
   variable allocations are to be supported, i.e., the ability for a
   customer to request more port numbers (and hence more possible
   simultaneous connections) at one time and fewer at another, the
   natural way to signal this is in the DHCP A+P request option.
   However, there is a tradeoff between the advantages of efficiently
   managing the extended address space via dynamic and/or variable
   allocation, and the cost it brings in terms of additional complexity.

   A simple DHCP release/request cycle could be used, but if the proper
   adjacent block of port numbers was not available, this would entail
   tearing down existing connection or reNATting them.  The
   disadvantages of the former are obvious; adopting the latter approach
   would bring back all of the disadvantages this scheme is intended to
   avoid.  One possible answer is to allocate ranges of IPs with a


Maennel, et al.          Expires April 30, 2009                [Page 15]

Internet-Draft          A+P Addressing Extension            October 2008


   static assigned port-range.  For example the ISP could offer "classes
   of service", e.g., the first block of IPs offer 4096 ports, the
   second class offers 512 ports, the third class offers 16 ports.  If
   the customer wants more ports, the address needs to be moved into a
   different class.  Obviously, this does not go without a service
   interruption for this particular customer (i.e., the customer has to
   get a new IP).  However, this solves the problem of dynamic
   allocation for the ISP.  We leave details of this issue for future
   work.

3.3.  IPv6 and mixed V4-V6 traffic

   Note that if IPv4/IPv6 dual stack is provided on the customer's LAN,
   IPv6 to IPv6 destinations would be be transported untranslated from
   the customer's host to the provider's border with other providers.

   If the customer has an IPv6-only LAN, then the device providing A+P
   translation should also provide NAT-PT service so that the customer
   could communicate with the IPv4 Internet.

3.4.  Handling ICMP

   ICMP is problematic for all NATs, because it lacks port numbers.  A+P
   routing exacerbates the problem.

   Most ICMP messages fall into one of two categories: error reports, or
   ECHO/ECHO reply (commonly known as "ping").  For error reports, the
   offending packet header is embedded within the ICMP packet; NATs can
   then rewrite that portion and route the packet to the actual
   destination host.  This functionality will remain the same with A+P;
   however, the provider's BR will need to examine the embedded header
   to learn with A+P NAT is handling it, while that box will do the
   necessary rewriting.

   ECHO and ECHO reply are more problematic.  For ECHO, the border
   router must rewrite the "Identifier" and perhaps "Sequence Number"
   fields in the ICMP request, so that returning ECHO REPLY packets may
   be routed correctly.  We suggest to rewrite the information in the
   sequence number to allow the BR returning ECHO replies to come back
   to the appropriate host.

3.5.  Handling IP fragments

   Much like ICMP packets, IP fragmented packets are renowned to be hard
   to handle in any address translation mechanism [RFC3022].  In fact,
   only the first IP fragment contains the TCP (UDP) header.  This issue
   is commonly dealt with by keeping additional state at the NAT device
   which allows fragments to be mapped to the correct TCP (UDP) session.


Maennel, et al.          Expires April 30, 2009                [Page 16]

Internet-Draft          A+P Addressing Extension            October 2008


   In the A+P NAT solution, fragments coming from the internal domain
   can be avoided if the core network runs IPv6 only and the PE ensures
   that no layer-3 fragmentation is performed by the customer equipment.
   Fragments coming from the external domain are harder to handle.
   Commercial NATs extract the port number out of the first fragment and
   keep that information to map subsequent fragments.  Moreover, when
   the first fragment is not the first one to be received at the NAT,
   the fragment needs to be stored until the port number is known
   [CCIE-Pro].  Note that a deployment scenario which intends to handle
   fragments must ensure that all of the fragments arrive at the same
   fragment handling host.

   We propose to route fragments to special boxes by exploiting the
   prefix combination in a similar way to Figure 1.  The BR is able to
   detect that a packet is fragmented when it receives it, so in that
   case it uses a different well-known prefix which is intended for
   fragments only (we call it WKPF).  Hence, the BR builds an IPv6
   packet where the destination address is WKPF+A and then uses normal
   routing.  Fragments are then routed to a special box which we call
   "fragment handler" (FH).  The FH is in charge of keeping track of the
   port numbers used by each fragment.  Namely, upon receiving the first
   fragment, the FH stores a mapping <src_ip, ip_id> --> <dst_port> (8
   bytes in total), which it uses to build the correct WKP+A+P address
   for all the fragments of the same IP packet (identified by the pair
   <src_ip, ip_id>).  After storing such a mapping, all subsequent
   fragments can be forwarded to the correct A+P destination address.
   This way, fragment storage is only required for out-of-order
   fragments, until the fragment carrying the port number is received.
   Since out-of-order packets are pretty rare, the FH is not expected to
   buffer an high number of fragments.  Observe that a CGN also needs to
   remember the dst_ip information, since it cannot trust the dst_ip in
   the packet itself.  In this case, each entry in the mapping takes 12
   bytes instead of 8.

   Finally, handling fragments via a specific prefix gives the network
   operator the flexibility to deploy multiple FHs.  There are two limit
   cases: on one hand, a single FHs that handles all the fragments in
   the network (the FH then announces WKPF); on the other hand, a FH for
   each destination IP (the FH then announces WKPF+A).  Again, the
   longest prefix matching rule gives the ISP the autonomy to choose any
   intermediate point in between.

3.6.  The incremental path to A+P

   In this section we will discuss one possibility for large networks to
   incrementally deploy A+P. As discussed above, the A+P scheme requires
   changes to the CPE, the BR, and (optionally) the PE.  Changes to the
   routing system include the addition of the WKP and WKPF.  The upgrade


Maennel, et al.          Expires April 30, 2009                [Page 17]

Internet-Draft          A+P Addressing Extension            October 2008


   of the BR, as well as routing the WKP/WKPF have to be done before the
   first customers transition to A+P. In addition, it is possible to
   provide the A+P NAT function at the PE routers while gradually
   upgrading the CPEs.  (We stress here once again, that as soon as the
   PE is upgraded and A+P is activated the customer must be able to
   operate its own CN, if he/she so desires.)  One important
   consideration has not been made so far: the BR mentioned in this
   document is essentially the BR of the A+P part of the network, and
   does not necessarily have to be the border router of the ISP.  In
   this sense it might be possible to upgrade a smaller, but contiguous
   part of a larger network, as long as it supports dual-stack.
   However, care needs to be taken that all routers (BR) that might form
   the boundary of the "upgraded cloud", are upgraded to A+P. In this
   case, those routers translate "A+P packets" into "legacy IPv4
   packets" and vice versa.

   A+P clouds can be independently deployed within the ISP network: the
   only constraint that needs to be satisfied is that the A+P address
   space does not overlap with the IPv4 address space which still serves
   legacy CPEs.  As the A+P deployment speeds up, small clouds can be
   easily merged into bigger ones, leading the way to the ultimate goal
   of a single, ISP-wide A+P cloud.  For instance, a deployment plan
   could be to install A+P clouds at some neighboring PoPs, then merge
   them at the state level, and so on.


4.  Benefits and limitations of A+P

   A+P addressing leverages internal routing in the ISP to route packets
   on extended addresses in a stateless manner.  This allows customers
   to be assigned globally routable addresses and to accept incoming
   connections on their A+P port range.  Observe that the statefulness
   of NATs hampers this desirable feature, and forces users to use out-
   of-band signaling (e.g., UPnP).  From the perspective of the ISP, on
   the other hand, A+P statelessness usually means lower deployment
   costs and less scalability issues with respect to stateful approaches
   like NAT.  Moreover, A+P allows ISP to fine-tune their network via
   standard internal routing management, without adding an extra layer
   of complexity (e.g., point-to-point tunnels).

   We now discuss the limitations of the A+P approach.  Recall that a
   transport session is identified by a 5-tuple

             <src_ip, src_port, dst_ip, dst_port, protocol>

   Hence, any mechanism that shares the same IP address among multiple
   hosts intrinsically poses limitations on the number of active
   transport sessions that a single host can maintain.  Observe that


Maennel, et al.          Expires April 30, 2009                [Page 18]

Internet-Draft          A+P Addressing Extension            October 2008


   connections with different hosts (or even different applications on
   the same host) are only minimally impacted, because they can be
   differentiated by means of the dst_ip (dst_port) field.  Therefore,
   the only case in which address sharing causes troubles is multiple
   outbound transport sessions with the same remote host and the same
   port.  In fact, in this case only the src_port field can be used to
   differentiate, however that field can not be fully exploited, since
   it is also used to multiplex multiple users on the same IP address.
   While multiple sessions with the same remote application are not a
   widely spread practice, some very popular websites (e.g., GoogleMaps
   and iTunes) have been reported to massively use multiple TCP/IP
   connections to maximize parallelism.  The current estimate of the
   number of parallel sessions used by those websites is circa 70
   [I-D.durand-softwire-dual-stack-lite].  In this respect, A+P with 8
   port bits would allow every host to maintain up to 256 parallel
   connections with the same remote process, while still providing 256
   times more addresses for end hosts.

   Another limitation that A+P shares with any other IP address sharing
   mechanism is the availability of well known ports.  In fact, services
   run by customers that share the same IP address will be distinguished
   by the port number.  As a consequence, it will be impossible for two
   customers who share the same IP address to run services on the same
   port (e.g., port 80).  Unfortunately, working around this limitation
   implies application-specific hacks (e.g., HTTP and HTTPS virtual
   hosting), whose discussion is out of the scope of this document.
   Observe that some popular applications (e.g., BitTorrent) require the
   availability of well known ports.  However, those applications can
   easily adapt to work with different ports, and users of such tools
   update them frequently (e.g., to exploit new features).


5.  IANA Considerations

   This document makes no request of IANA.

   Note to RFC Editor: this section may be removed on publication as an
   RFC.


6.  Security Considerations


7.  Acknowledgements

   The authors wish to thank David Ward for review, endless constructive
   criticism, and interminable questions, and Cullen Jennings for
   discussion and review of fragmentation.  We also like to thank the


Maennel, et al.          Expires April 30, 2009                [Page 19]

Internet-Draft          A+P Addressing Extension            October 2008


   following persons for their valuable feedback on earlier versions of
   this work: Bernhard Ager, Alain Durand, Dino Farinacci, Hamed
   Haddadi, Russ Housley, Wolfgang Muehlbauer and Ruediger Volk.


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

8.2.  Informative References

   [BCP38]    Ferguson, P. and D. Senie, "Network Ingress Filtering:
              Defeating Denial of Service Attacks which employ IP Source
              Address Spoofing", BCP 38, May 2000.

   [CCIE-Pro]
              Doyle, J., "Routing TCP/IP Volume I (CCIE Professional
              Development)", 1998.

   [I-D.durand-softwire-dual-stack-lite]
              Durand, A., Droms, R., Haberman, B., and J. Woodyatt,
              "Dual-stack lite broadband deployments post IPv4
              exhaustion", draft-durand-softwire-dual-stack-lite-00
              (work in progress), September 2008.

   [Martin-Java]
              Martin, D., Rajagopalan, S., and A. Rubin, "Blocking Java
              Applets at the Firewall", Proceedings of the Internet
              Society Symposium on Network and Distributed System
              Security, pp. 16-26, 1997.

   [RFC0959]  Postel, J. and J. Reynolds, "File Transfer Protocol",
              STD 9, RFC 959, October 1985.

   [RFC2766]  Tsirtsis, G. and P. Srisuresh, "Network Address
              Translation - Protocol Translation (NAT-PT)", RFC 2766,
              February 2000.

   [RFC3022]  Srisuresh, P. and K. Egevang, "Traditional IP Network
              Address Translator (Traditional NAT)", RFC 3022,
              January 2001.

   [RFC4966]  Aoun, C. and E. Davies, "Reasons to Move the Network
              Address Translator - Protocol Translator (NAT-PT) to
              Historic Status", RFC 4966, July 2007.


Maennel, et al.          Expires April 30, 2009                [Page 20]

Internet-Draft          A+P Addressing Extension            October 2008


   [SP-NAT]   Alcock, S., Nelson, R., and D. Miles, "Characterizing the
              Network Connection Behavior of Residential Broadband
              Subscribers", draft, under-submission , 2009.


Authors' Addresses

   Olaf Maennel
   T-Labs/TU-Berlin
   Ernst-Reuter-Platz 7
   Berlin  10587
   Germany

   Phone: +491607199931
   Email: olaf@maennel.net


   Randy Bush
   Internet Initiative Japan
   5147 Crystal Springs
   Bainbridge Island, Washington  98110
   US

   Phone: +1 206 780 0431 x1
   Email: randy@psg.com


   Luca Cittadini
   Universita' Roma Tre
   via della Vasca Navale, 79
   Rome,   00146
   Italy

   Phone: +39 06 5733 3215
   Email: luca.cittadini@gmail.com


   Steven M. Bellovin
   Columbia University
   1214 Amsterdam Avenue
   MC 0401
   New York, NY  10027
   US

   Phone: +1 212 939 7149
   Email: bellovin@acm.org


Maennel, et al.          Expires April 30, 2009                [Page 21]

Internet-Draft          A+P Addressing Extension            October 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Maennel, et al.          Expires April 30, 2009                [Page 22]