Network Working Group                                             D. Jen
Internet-Draft                                                 M. Meisel
Intended status: Informational                                 D. Massey
Expires: January 3, 2008                                         L. Wang
                                                                B. Zhang
                                                                L. Zhang
                                                            July 2, 2007


                APT: A Practical Transit Mapping Service
                          draft-jen-apt-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 3, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   The size of the global routing table is a rapidly growing problem.
   Several solutions have been proposed.  These solutions commonly
   divide the Internet into two parts, one for customers and one for
   providers, where only provider addresses are globally routable.
   Packets destined for customer addresses are tunneled through provider


Jen, et al.              Expires January 3, 2008                [Page 1]

Internet-Draft               Transit Mapping                   July 2007


   space.  For this process to work, there must be a mapping service
   that can supply an appropriate provider-edge address for any given
   customer address.  We present a design for such a mapping service.
   We adhere to a "do no harm" design philosophy: maintain all desirable
   features of the current architecture without negatively affecting its
   security or reliability.  Our design aims to minimize delay and
   prevent loss in packet encapsulation, minimize the number of new or
   modified devices, and keep the level of control traffic manageable.


Table of Contents

   1.  Requirements Notation  . . . . . . . . . . . . . . . . . . . .  3
   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  The Mapping Service  . . . . . . . . . . . . . . . . . . . . .  5
     4.1.  A Mapping Example  . . . . . . . . . . . . . . . . . . . .  6
   5.  Multihoming Support  . . . . . . . . . . . . . . . . . . . . .  7
     5.1.  Using Alternate ETRs During Failures . . . . . . . . . . .  8
       5.1.1.  Handling TS Prefix Failure . . . . . . . . . . . . . .  9
       5.1.2.  Handling Single TS Address Failure . . . . . . . . . .  9
       5.1.3.  Handling User-to-TR Link Failure . . . . . . . . . . . 10
     5.2.  Summary of Requirements for Multihoming Support  . . . . . 11
   6.  Exchanging Mappings Between ASes . . . . . . . . . . . . . . . 11
     6.1.  In Defense of BGP  . . . . . . . . . . . . . . . . . . . . 12
   7.  Security and Robustness  . . . . . . . . . . . . . . . . . . . 13
     7.1.  Detecting Misconfigurations  . . . . . . . . . . . . . . . 13
     7.2.  ICMP Mapping Packets . . . . . . . . . . . . . . . . . . . 14
     7.3.  Other ICMP Packets . . . . . . . . . . . . . . . . . . . . 14
     7.4.  Default Mapper Scalability . . . . . . . . . . . . . . . . 15
   8.  Incremental Deployment . . . . . . . . . . . . . . . . . . . . 15
   9.  Future Work  . . . . . . . . . . . . . . . . . . . . . . . . . 16
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 17
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 17
     12.2. Informative References . . . . . . . . . . . . . . . . . . 17
   Appendix A.   BGP Mapping Announcement Fields  . . . . . . . . . . 18
   Appendix B.   ICMP Mapping Message Fields  . . . . . . . . . . . . 19
   Appendix C.   ICMP Border Link Failure Fields  . . . . . . . . . . 19
   Appendix D.   Hidden Backup Mappings . . . . . . . . . . . . . . . 19
   Appendix D.1. Hidden Backup Mapping Protocol . . . . . . . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
   Intellectual Property and Copyright Statements . . . . . . . . . . 23


Jen, et al.              Expires January 3, 2008                [Page 2]

Internet-Draft               Transit Mapping                   July 2007


1.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


2.  Introduction

   The unexpected, explosive growth of the Internet is causing a greater
   and greater strain on its infrastructure.  This problem has been
   well-documented in [RAWS][AddrAlloc].  Several solutions have been
   proposed to address this problem [EFIT][CRIO][LISP], most of which
   involve separating the Internet into two parts -- one for user
   networks and one for transit providers.  Routers in transit space
   would only need to know how to route to transit prefixes, which are
   stable and conducive to topological aggregation.  When a packet is
   sent from user address A to destination user address B, A's provider-
   edge router (the ingress tunnel router, or "ITR", as defined in
   [LISP]) encapsulates the packet and sends it through transit space to
   B's provider-edge router (the egress tunnel router, or "ETR").  B's
   ETR decapsulates the packet and forwards it to the appropriate
   recipient, B.

   When encapsulating a packet, A's ITR must somehow determine B's ETR's
   transit-space address and include it in the outer header.  In
   general, any ITR must be able to map any given user-space address to
   a corresponding ETR transit-space address for proper tunneling
   through transit space.  This illustrates the need for a mapping
   service that can provide this address.  The design details of this
   mapping service will play a large part in determining the
   effectiveness of any proposed implementation of a user/transit
   provider address space separation.  The mapping service also presents
   an exciting opportunity to enhance the services currently offered by
   the Internet, which is further reason to carefully consider how this
   service should be implemented.  Should mapping information be
   distributed via a push or a pull model?  What additional information,
   if any, should be obtained along with the mapping information?  Can
   we satisfy the mapping requirement without sacrificing any services
   or packet delivery quality?

   Our answers to these questions are rooted in a "do no harm" design
   philosophy: improve routing scalability without sacrificing any
   desirable features in the current architecture or negatively
   affecting its security and reliability.  To this end, we present APT,
   A Practical Transit mapping service designed with the following goals
   in mind.


Jen, et al.              Expires January 3, 2008                [Page 3]

Internet-Draft               Transit Mapping                   July 2007


   o  Minimize delay and prevent loss in packet encapsulation.

   o  Minimize the number of devices that need to be modified to support
      our new design.

   o  Minimize the number of devices that will require additional
      resources or complexity.

   o  Keep the design modular so that the method used to propagate
      mapping information is independent from the method used to
      retrieve mapping information for tunneling.

   APT is designed for use with eFIT [efitID][EFIT], one of the major
   proposals for user/transit provider address space separation.
   However, APT should be generally applicable to other proposals of the
   same class.


3.  Terminology

   User Network (UN) - A network that pays another organization to
   deliver its packets through the Internet.  Each user network is a
   customer of some Transit Network (see definition below).  "User
   network" holds the same meaning as it does in the eFIT proposal.

   Transit Network (TN) - An AS whose business is to provide packet
   delivery services for its customers.  Transit Networks serve as
   providers for user networks.  As a rule of thumb, if the AS appears
   in the middle of any ASPATH in a BGP route today, it is considered a
   transit network.

   Transit Space (TS) - The address space used by transit networks.
   Nodes within a transit network are assigned TS addresses.  Sometimes
   the term "transit space" will refer to the non-edge area of the
   Internet where TS prefixes are routable.

   User Space (US) - The address space used by user networks.  Nodes
   within a user AS are assigned user-space addresses.  Sometimes the
   term "user space" will refer to the edges of the Internet whose
   prefixes are not routable in transit space (though packets to those
   addresses are deliverable through transit space).  We assume that TS
   and US addresses can be clearly distinguished.

   Border Link - A link that crosses the boundary between transit space
   and user space.

   Default Mapper - A new device required by our mapping service.  Each
   transit network MUST have at least one default mapper.  A default


Jen, et al.              Expires January 3, 2008                [Page 4]

Internet-Draft               Transit Mapping                   July 2007


   mapper carries a complete mapping table.  In other words, given any
   user-space address, default mappers can return the TS address of a
   provider-edge router corresponding to that address.  To support the
   growing trend towards multihoming, default mapping entries will map a
   user-space prefix to a non-empty SET of TS destinations, all of which
   have a direct connection to the destination network in user space.

   Tunnel Router (TR) - These devices will replace all current provider-
   edge routers, located at the provider end of border links.  Like ITRs
   and ETRs in LISP [LISP], TRs provide the encapsulation and
   decapsulation services required for tunneling user packets through
   transit space.  A TR has both ITR and ETR functionality, meaning that
   any TR can perform both encapsulation and decapsulation of packets.
   To properly encapsulate any given user-space packet, TRs can query
   the default mappers for mapping information.  TRs also cache commonly
   used mapping entries locally.  Note that TR cache entries are NOT
   identical to the mappings stored at default mappers (see the
   definitions of "mapping" and "mapping entry" below).  TRs are
   designed to be as simple and as fast as possible, adding only what is
   necessary for proper tunneling functionality.

   APT Node - A general term referring to any new device type introduced
   by APT.  This includes both default mappers and TRs.

   Router - These are ISP-owned non-border routers that exist today.
   Other than minor configuration changes, these routers need no
   alteration or replacement, and can be used just as they are used
   currently.

   Mapping - A mapping contains a user-space prefix and a non-empty SET
   of ETR TS addresses associated with the prefix.  Mappings also
   include related information such as the user's public key and
   priority rankings for each of the ETRs in the set.  Default mappers
   store mappings.

   Mapping Entry - A mapping entry contains a user-space prefix and any
   SINGLE ETR TS address associated with the prefix.  Any mapping entry
   is a subset of the complete mapping for its user-space prefix.  TRs
   store mapping entries along with an associated TTL.  A mapping entry
   is removed once its TTL expires.


4.  The Mapping Service

   To minimize the latency introduced by encapsulation, APT seeks to
   store mapping information as close to the ITR as possible.  However,
   the global mapping table is likely to grow very large over time.  To
   avoid undue memory requirements for ITRs while still keeping mapping


Jen, et al.              Expires January 3, 2008                [Page 5]

Internet-Draft               Transit Mapping                   July 2007


   information within reach, we introduce the concept of default
   mappers.

   A TR does not need to store the entire global mapping table.
   Instead, it queries a default mapper for mapping information and
   caches recently used mapping entries.

   Default mappers are the only devices in the network that need to
   store the complete global mapping table.  As we will see in the
   following example, TRs only use default mappers in the event of a
   cache miss.  This means that, given large enough caches at the TRs,
   network latency will not heavily depend upon default-mapper
   performance.  Additionally, we propose the use of anycast to reach
   default mappers within an AS.  Each TN AS need only have a single
   default mapper, but the use of anycast makes it easy for a TN to
   deploy more.  The result is a robust, scalable default mapping
   system.

4.1.  A Mapping Example


          _                           _
         / \                         / \
        / A \                       / B \________
        \___/                       \___/       |
          |                           |         |         User Space
 - - - - -|- - - - - - - - - - - - - -|- - - - -|- - - - - - - - - - - -
          |                           |         |        Transit Space
       .--+---.                    .--+---.     |
     _-| ITR1 |-_                _-| ETR1 |-_   |
    /  '------' .`--.        .--'. '------' .`--+--.
   |     ____   | X |--------| X |   ____   | ETR2 |
   |    | M1 |  '-;-'        '-:-'  | M2 |  '-;----'
    \   '-/\-'   /              \   '----'   /
     \___/  \___/                \__________/
  ______/    \____________________
 | User Net | TS Addr  | Priority |
 |----------|----------|----------|
 |   ...    |   ...    |    ...   |
 |----------|----------|----------|
 |    B     |   ETR1   |    10    |
 |          |   ETR2   |    20    |
 |----------|----------|----------|
 |   ...    |   ...    |    ...   |
 '--------------------------------'

   Figure 1.  This is a simple topology for demonstrative purposes.  A
   and B are user networks addressable via user-space prefixes, ITR1,


Jen, et al.              Expires January 3, 2008                [Page 6]

Internet-Draft               Transit Mapping                   July 2007


   ETR1, and ETR2 are TRs, any node labeled "X" is a router, and M1 and
   M2 are default mappers.  A portion of the mapping table for M1 is
   shown.

   In this section, we illustrate how TRs and default mappers interact
   within an AS to properly tunnel user-space packets through transit
   space.

   In Figure 1, assume a node in network A sends a packet to a user-
   space address in network B. When this packet arrives at ITR1, ITR1
   looks up the destination user-space address in its mapping cache.  If
   a matching prefix is present in its cache, ITR1 simply encapsulates
   the packet with the corresponding TS destination address and send it
   across transit space.  If a matching prefix is not present, ITR1 will
   send the packet through its default mapper.  It does this by
   encapsulating the packet with the anycast address for default mappers
   in its AS as the destination.

   This packet will arrive at M1, the only default mapper in ITR1's AS.
   When M1 receives the packet, it decapsulates it and examines the
   user-space destination address.  Since default mappers store the
   full, global mapping table, a default mapper will always be able to
   encapsulate the packet with a valid TS destination address.  All
   packets encapsulated by a default mapper MUST contain the default
   mapper's TS address as the source address.

   In addition to forwarding the packet to an appropriate ETR (ETR1, in
   this case), M1 also treats the incoming packet as an implicit request
   from ITR1 for mapping information.  M1 responds to ITR1 with an ICMP
   packet containing a mapping entry that maps B to ETR1.  This allows
   ITR1 to add this mapping entry to its cache so that ITR1 can tunnel
   further packets destined for B directly to ETR1.  The mapping entry
   also contains a time to live (TTL) that is set by M1.  The TTL
   ensures that ITR1 will occasionally re-request this mapping
   information from M1.  At this time, if the mapping information
   changed in any way since ITR1's prior request, M1 can respond with an
   updated mapping entry.  Without this TTL, ITR1's cached information
   may become inaccurate over time.


5.  Multihoming Support

   In the example above, the observant reader may have noted that B is
   multihomed.  That is, B can be reached through both ETR1 and ETR2.
   Multihoming provides B with both enhanced reliability in case of a
   connectivity failure and the flexibility to split incoming traffic
   across different ETRs.


Jen, et al.              Expires January 3, 2008                [Page 7]

Internet-Draft               Transit Mapping                   July 2007


   In accordance with our design goals, all of the logic for selecting a
   destination for a multihomed user is contained within default
   mappers.  Default mappers will store mappings containing all of the
   ETRs for a given user-space prefix, and ITRs will only store a single
   mapping entry per user-space prefix.  When an ITR requests a mapping
   entry for a multihomed user, it is up to the default mapper to decide
   which one to return.

   Many users will want to have some control over which ETR is used for
   incoming traffic.  To allow this, we let users assign a priority
   value to each of the mapping entry for their prefixes, making it
   available to all default mappers throughout the transit space (see
   Section Section 6).  The number is to be treated like a ranking -- an
   ETR with a lower priority value is more preferable.

   At the same time, a transit network may also has its own preference
   regarding which of the ETRs to use for a given user-space prefix.
   Default mappers can use a combination of locally configured routing
   policies and the user priority information to choose from a set of
   valid ETR addresses.  Going back to Figure 1, assume that ITR1 does
   not have a mapping entry for B in its cache.  When A sends B a
   packet, ITR1 will send the packet to M1.  If M1 has no preference
   between ETR1 and ETR2, it will examine the priority values in B's
   mapping and select ETR1, B's most preferred ETR.  M1 forwards the
   packet to ETR1 and returns the appropriate mapping entry to ITR1,
   which stores the mapping entry in its cache.

   In the case of a priority value tie, the default mapper can break the
   tie by picking the ETR to which it has the shortest path.  If some
   ETRs are tied in terms of both lowest priority value and shortest
   path, the default mapper is free to break the tie arbitrarily.  The
   address of the selected ETR will be used as the destination address
   when encapsulating the packet.

   We envision that users will be able to manipulate their incoming
   traffic load by setting appropriate priority values in their mapping.
   A user who wants load balancing can assign the same priority value to
   all of his mapping entries.  A user who wants to have one TN as a
   primary provider and another only as a backup can simply assign a
   higher priority value to his ETR at his backup provider.

5.1.  Using Alternate ETRs During Failures

   When a network failure has caused an ETR to become unreachable, an
   affected multihomed user will expect his traffic to be temporarily
   routed through alternate ETRs.  There are three general types of
   failures that would require an ITR to use an alternate ETR: (1) an
   ITR may discover via BGP that it can no longer reach the TS prefix


Jen, et al.              Expires January 3, 2008                [Page 8]

Internet-Draft               Transit Mapping                   July 2007


   containing the address of the intended ETR, (2) an ITR may learn via
   ICMP Destination Unreachable packets that its intended ETR is
   unreachable, and (3) the link between a user network and its TR may
   be down, a new problem introduced by the tunneling architecture.  We
   will explain how to handle each of these failure types below, using
   Figure 1 as a reference.  We assume that, at the time of failure, all
   TN ASes are using ETR1 to reach B.

   To assist in handling these failures, we include a time till retry
   (TTR) for each mapping entry in every mapping stored in default
   mappers.  Normally, the TTR for each mapping entry is set to zero,
   indicating that it is usable.  Any mapping entry with a non-zero TTR
   value is considered invalid.  We will refer to the action of setting
   a mapping entry's TTR as "invalidating the entry."  Mapping entries
   that map to unroutable destinations are also considered invalid.  So
   long as a mapping entry is invalid, default mappers will not use this
   entry as a destination address or include it in mapping responses.
   The role of the TTR for handling failures will become clear in the
   explanations below.

5.1.1.  Handling TS Prefix Failure

   For failures of type (1), ITR1 has no route to ETR1.  Assume a host
   in network A attempts to send a packet to a host in network B. If
   ITR1 does not have B's mapping in its cache, it will forward the
   packet to M1 (see Section Section 4.1).  If ITR1 does have B's
   mapping in its cache, it will see that it has no path to ETR1, and
   send the packet to M1 instead.  M1 will also see that it has no route
   to ETR1, and thus select the next-most-preferred ETR for B, ETR2.  If
   it has a route to ETR2, it sends the packet with ETR2 as the TS
   destination address and replies to ITR1 with the corresponding
   mapping entry.  M1 can assign a relatively short TTL to the mapping
   entry in its response.  Once this TTL expires, ITR1 will forward the
   next packet for B to the default mapper, which will respond with the
   most-preferred mapping entry that is routable at that time.  This
   allows ITRs to quickly go back to using ETR1 once it becomes routable
   again.

5.1.2.  Handling Single TS Address Failure

   In the second case, the TS prefix containing ETR1 is still routable
   from ITR1, but ETR1 is unreachable from ITR1.  Thus, ITR1 will
   receive an ICMP Destination Unreachable message in response to any
   packet sent to ETR1.  ITR1 will need to turn to its default mapper
   for an alternate TS destination address for B. M1 will send an
   alternate valid mapping entry (if available) to ITR1.  For this to
   work, TRs MUST forward all received ICMP Destination Unreachable
   messages to their default mappers.  Default mappers MUST then


Jen, et al.              Expires January 3, 2008                [Page 9]

Internet-Draft               Transit Mapping                   July 2007


   invalidate ALL mapping entries that map to the unreachable TS
   destination address.  To allow this, default mappers will have a
   reverse-mapping table to go along with their mapping table.  These
   reverse-mapping tables map TS addresses to their corresponding user-
   space prefixes.  Now default mappers can look up the unreachable TS
   address in their reverse-mapping tables, and temporarily invalidate
   all entries that map to that TS address.

5.1.3.  Handling User-to-TR Link Failure

   The final case involves a failure of the link connecting ETR1 to B.
   In the previous two cases, current Internet standards were in place
   to allow ITR1 to know that a failure occurred.  This case, on the
   other hand, is a new type of failure that does not exist in today's
   infrastructure.  Therefore, it will require a new type of failure
   message.  These messages will take the form of a new ICMP message
   type, which will include the user-space prefix that was not
   reachable.  TRs MUST be configured to forward all border link failure
   ICMP messages to their default mappers, in the same fashion that TRs
   forward all destination unreachable ICMP messages to their default
   mappers.

   Going back to our example, when ETR1 discovers it cannot forward the
   packet to B due to a border link failure, it will send ITR1 an ICMP
   packet of our new type stating that B's prefix is currently
   unreachable.  ITR1 will forward the border link failure ICMP message
   to its default mapper, which will invalidate that mapping entry.  If
   the mapping entry is already invalid, it will reset the entry's TTR.
   If the prefix has an alternate valid mapping entry, M1 will send this
   mapping entry to ITR1.

   Furthermore, to minimize packet losses, ETR1 should not simply drop
   the packet addressed to the unreachable user network.  Instead, ETR1
   should send this packet to M2 in hopes of finding an alternate ETR
   that can reach the user network.  However, M2 will then look up a TS
   destination address for B and choose ETR1 again.  This is
   undesirable, since we are seeking an alternative destination.
   Therefore, when encapsulating packets for forwarding, default mappers
   MUST check if the chosen TS destination address is the same as the TS
   sender address in the packet's original TS header.  If so, this
   indicates that the TS-to-user link is down at this ETR.  In such
   cases, default mappers MUST invalidate the corresponding mapping
   entry and seek an alternative.

   To complete our example, ETR1 sends an ICMP message to ITR1 and also
   sends the data packet to M2.  M2 looks up a destination TS address
   for the packet and finds ETR1.  M2 then compares this TS address with
   the TS address of the original sender of the packet, which is also


Jen, et al.              Expires January 3, 2008               [Page 10]

Internet-Draft               Transit Mapping                   July 2007


   ETR1.  Since they are the same, M2 invalidates this mapping entry and
   finds an alternate destination, ETR2.  M2 then forwards the packet to
   ETR2.

5.2.  Summary of Requirements for Multihoming Support

   TR cache entries MUST include a TTL value, which will be provided by
   their default mapper.

   In default mappers, every TS destination address in a mapping MUST
   include a time until retry (TTR).  Usable mapping entries have a TTR
   of zero.  When a mapping entry becomes unreachable due to failures,
   the TTR MUST be set to a pre-configured value.  An alternate entry in
   the same mapping MUST be used in place of an invalid mapping entry if
   available.

   Default mappers MUST be able to invalidate all mapping entries that
   map to a particular TS destination address that has become
   unreachable.  This can be implemented using a reverse-mapping table.

   We will use a new type of ICMP message to indicate border link
   failure.

   TRs MUST forward all ICMP destination unreachable and border link
   failure messages to their default mapper.

   If an ETR cannot send a packet due to a border link failure, it MUST
   send this packet to its default mapper.  This ETR MUST use its own TS
   address as the source TS address of the packet.

   Upon receipt of any data packet, default mappers MUST check if the
   chosen TS destination address is the same as the TS source address in
   the packet's original TS header.  If so, default mappers MUST
   invalidate the corresponding mapping entry and look for an alternate
   ETR for the packet.


6.  Exchanging Mappings Between ASes

   In order for default mappers to store a full, global mapping table,
   there must be some way for them to receive mappings from other ASes.
   To avoid introducing latency or packet loss when encapsulating
   packets, the default mappers must have a full set of mappings
   available locally.  To accomplish this, we distribute mappings using
   a push method.  Default mappers MUST regularly announce the mappings
   for all of their customers to the rest of the network.

   When a default mapper receives new mappings, it stores them in its


Jen, et al.              Expires January 3, 2008               [Page 11]

Internet-Draft               Transit Mapping                   July 2007


   mapping table, replacing any existing mappings.  When a TR receives
   new mappings, it simply deletes any matching cache entries.  Any
   further communication with the formerly cached host will require the
   use of a default mapper.  This ensures that only the default mappers
   need to validate mapping announcements and enforce policy.

   Mapping messages will be flooded throughout the network via BGP.  A
   new BGP attribute will be required for this purpose.

   We have selected BGP initially in order to ease incremental
   deployment and minimize the changes required to existing routers.
   However, mapping announcements could easily be distributed via a
   different reliable broadcast protocol at a later date.  Transitioning
   mapping distribution to a different protocol will not affect any
   other aspect of APT.

6.1.  In Defense of BGP

   Despite the use of BGP, mapping announcements will not cause the same
   problems that BGP routing announcements do in the Internet today for
   the following reasons.

   First, for routing announcements, the path taken to reach each router
   is a crucial piece of information.  For mapping announcements, the
   path taken to reach each APT node is not meaningful.  This means that
   only a single copy of each mapping announcement needs to reach each
   APT node, providing an opportunity to prune duplicates, or even to
   make use of a spanning tree.  This also means that path exploration
   and its repercussions do not exist for mapping announcements.

   Second, mapping announcements only require processing at default
   mappers and, to a lesser extent, TRs.  Other routers in the network
   need only pass these announcements along to their peers.  Thus, the
   processing burden placed on other routers by excessive routing
   updates is completely avoided.

   Finally, there will be far fewer mapping announcements than there are
   routing announcements.  TNs rarely change the addresses of their
   equipment, and customers are generally under a monthly contract with
   their provider TNs.  Therefore, permanent mapping changes are
   unlikely to occur more than once per month per customer.
   Furthermore, transient failures do not cause mapping announcements in
   APT.  The most common cause of mapping announcements will be regular
   refresh announcements, which should never need to be sent more than
   every other day in most cases.


Jen, et al.              Expires January 3, 2008               [Page 12]

Internet-Draft               Transit Mapping                   July 2007


7.  Security and Robustness

   Using BGP to distribute mapping announcements guarantees that they
   are only accepted from manually configured BGP peers.  This ensures
   that mapping announcements are no less secure than routing
   announcements today.  When applied to the eFIT architecture, however,
   the security of this scheme is greatly increased.  This is due to the
   fact that eFIT TS addresses are not addressable from user space
   [efitID][EFIT].  This turns out to be a major boon for the BGP trust
   model, since only other TS nodes are valid BGP peers.

   The complete separation of the eFIT TS address space provides another
   security benefit: malicious users cannot attack equipment that they
   cannot address.  End users simply cannot affect the TS nodes that
   their packets travel through within transit space.

   Despite these benefits, there are some additional issues introduced
   by APT.  Manually configured mappings provide an opportunity for
   human error, our reliance on ICMP packets provides an opportunity for
   spoofing and cache poisoning, and storing the entire global mapping
   table at default mappers poses a threat to long-term scalability.
   The remainder of this section will address each of these issues in
   turn.

7.1.  Detecting Misconfigurations

   Due to the fact that only TNs will have access to transit space,
   false mapping updates are far more likely to be the result of
   accidental misconfigurations than malicious attacks.  With this in
   mind, we present a simple, extensible authentication scheme that can
   detect and, in some cases, prevent accidental misconfigurations.

   The types of misconfigurations that could potentially be harmful are
   those that result in one provider accidentally interfering with the
   mapping for another provider's customer.  This can happen whenever a
   provider accidentally announces a mapping for the wrong user-space
   prefix.  These types of accidental conflicts fall into three
   categories: (1) a provider announces a mapping for a prefix owned by
   another provider's customer, (2) a provider announces a mapping for a
   shorter user-space prefix that contains a longer user-space prefix
   owned by owned by another provider's customer, and (3) a provider
   announces a mapping for a longer user-space prefix that is a subset
   of a shorter user-space prefix owned by another provider's customer.

   The first category of conflicts is the only one that we intend to
   actively prevent.  Clearly, the user that owns a particular user-
   space prefix should be the ultimate authority for his mapping
   information.  However, user networks do not announce their mappings


Jen, et al.              Expires January 3, 2008               [Page 13]

Internet-Draft               Transit Mapping                   July 2007


   to the network directly, but rather through their providers.  In
   order to ensure a mapping update for a user-space prefix is approved
   by its rightful owner, we must include some sort of user
   authorization string in each announcement.  To this end, we introduce
   a public-key field into each mapping.  This field SHOULD contain a
   cryptographically valid public key, but it will only rarely need to
   be used as such.  In the normal case, when a default mapper receives
   a new mapping announcement that would replace an existing one, it
   only needs to ensure that the public key has not changed.  (This
   scheme is similar in spirit to the way that OpenSSH uses its
   'known_hosts' file.)  However, as long as all of a UN's providers
   store the corresponding private key, the distribution of public keys
   also introduces the possibility of using cryptographic signatures for
   any number of purposes within transit space.

   For the other two categories, it is less clear that such an
   announcement is the result of a misconfiguration.  It is possible,
   for example, that the owner of a /16 user-space prefix has resold
   some of the contained /24 prefixes to other UNs.  In such cases, only
   the administrators will know if the announcement is valid.  It is for
   this reason that (in the spirit of PHAS [PHAS]) we do not attempt to
   prevent such changes, but only detect and notify interested parties.
   Since legitimate mapping changes are infrequent, notifying interested
   parties of mapping changes via e-mail is a perfectly viable option.
   These notifications could also prove useful in debugging the mapping
   service, or a particular provider's configuration.

7.2.  ICMP Mapping Packets

   ICMP mapping packets are used exclusively by default mappers to send
   mapping entries to the TRs within their AS.  Therefore, there is no
   reason that these ICMP packets should ever need to travel between
   ASes.  In order to prevent cache poisoning through spoofing, these
   ICMP packets simply MUST be dropped at all border routers within
   transit space.

7.3.  Other ICMP Packets

   Our mapping service also depends on two other types of ICMP packets:
   existing ICMP Destination Unreachable messages, and our new ICMP
   Border Link Failure messages.  Both of these packet types must
   traverse AS boundaries.  Again note that, under the eFIT
   architecture, these packets are already more trustworthy than ICMP
   packets in the current infrastructure -- they can only be generated
   by hosts in transit space.  However, if this level of security is
   deemed insufficient, the keys used for detecting misconfigurations
   could be used to cryptographically sign such packets, ensuring that
   they are coming from the appropriate sender.


Jen, et al.              Expires January 3, 2008               [Page 14]

Internet-Draft               Transit Mapping                   July 2007


7.4.  Default Mapper Scalability

   Theoretically, the global mapping table could grow to contain a
   separate mapping for every user-space prefix.  In the case of IPv6
   prefixes, the total number of mappings would be on the order of
   10^18, far more than we can expect to be able to store on a single
   device.  If the global mapping table were to approach such gargantuan
   proportions, a few simple changes to the default-mapper model would
   allow APT to scale gracefully.

   Instead of each default mapper storing the full, global mapping
   table, each default mapper would store only a subset of the table.
   This subset would be aggregatable by user-space prefix.  For
   addresses outside of this subset, a default mapper would store
   mappings that mapped short, artificially aggregated prefixes to the
   TS addresses of other default mappers.  Like virtual prefixes in CRIO
   [CRIO], the user-space prefixes in these mappings would not
   necessarily correspond to actual user prefixes.

   Each virtual prefix would be announced by the default mapper
   responsible for the corresponding subset of the global mapping table.
   In order to ensure complete coverage of the user address space, some
   central authority would need to assign these virtual prefixes to
   individual transit networks.

   This scheme allows for a tradeoff between latency and default mapper
   storage requirements.  (For more discussion of the characteristics of
   such a tradeoff, see [CRIO].)  However, this scheme also requires
   some providers to become authoritative sources for mappings owned by
   other providers' customers.  Both this requirement and the need to
   involve a central authority could prove problematic for deployment.
   Therefore, we do not recommend using this scheme unless the size of
   the global mapping table demands it.


8.  Incremental Deployment

   Clearly, the deployment of APT will coincide with the deployment of
   eFIT (or a similar architecture).  Though incremental deployment of
   the eFIT architecture itself is beyond the scope of this document, we
   must at least show that APT will behave properly under partial
   deployment.

   Under the eFIT architecture, addresses outside of transit space will
   not change.  This means that user-space prefixes will initially share
   the existing IP address space.  This fact provides us with a simple
   method for delivering packets to addresses for which no mapping is
   available.  Presumably, the only such addresses will be those


Jen, et al.              Expires January 3, 2008               [Page 15]

Internet-Draft               Transit Mapping                   July 2007


   connected to providers who have not yet adopted the new architecture.
   In order to deliver such packets, APT nodes can simply return them to
   the old infrastructure, and they will be routed as they are today.
   In order to support this feature, default mappers will respond to TRs
   with an ICMP mapping packet that indicates that no entry exists for
   the given user-space prefix.  TRs will keep a negative cache entry
   for such prefixes so that they can forward such packets directly to a
   non-TS router.

   In discussing incremental deployment, we must also address the issue
   of how new default mappers will acquire the complete mapping table
   when they are first connected to transit space.  Since our mapping
   service design requires that all ASes re-announce all of their
   mappings at a regular interval, commissioning a new default mapper
   only requires connecting it to the network and waiting for all other
   TS ASes to re-announce their mappings.  Yet, this introduces a
   potential problem -- if there is no upper bound on the regular
   refresh interval, there will be no upper bound on how long a new
   default mapper needs to wait until its mapping table is complete.
   Therefore, there needs to be an upper bound on the refresh interval
   for mappings.  An appropriate value would be once a week.  This would
   mean that a newly deployed default mapper would be able to reach the
   entire transit space one week later (with the exception of any ASes
   that failed to follow protocol).


9.  Future Work

   Optimally, any design paper should include an evaluation section.  In
   the future, we will examine traces of Internet activity to determine
   the characteristics of the tradeoff between TR cache size and default
   mapper workload, the amount of traffic overhead that would be
   incurred by our push-based design, and any other results that the
   community deems useful.

   We are also considering automating user mapping updates.  Under our
   current design, whenever a user needs to update his mapping
   information (he may add, subtract, or change providers, or change his
   priority values), the user must contact his providers offline and
   request that they announce the updated mapping information.  It is
   then up to the providers to update the mapping information.  As we
   have seen with DNS updates, human involvement introduces the
   possibility of human error and delay.  We hope to provide UNs with an
   automated way to manage their mapping information.


Jen, et al.              Expires January 3, 2008               [Page 16]

Internet-Draft               Transit Mapping                   July 2007


10.  IANA Considerations

   This memo includes no request to IANA.


11.  Security Considerations

   Security considerations for APT are discussed in Section Section 7.


12.  References

12.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

12.2.  Informative References

   [efitID]   Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Proposal
              for Scalable Internet Routing and Addressing", Internet Dr
              aft, http://www.ietf.org/internet-drafts/
              draft-wang-ietf-efit-00.txt, 2 2007.

   [EFIT]     Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Scalable
              Routing System Design for Future Internet", SIGCOMM IPv6
              Workshop , 8 2007.

   [LISP]     Farinacci, D., Fuller, V., and D. Oran, "Locator/ID
              Separation Protocol (LISP)", Internet Draft, http://
              www.ietf.org/internet-drafts/draft-farinacci-lisp-00.txt,
              2007.

   [PHAS]     Lad, M., Massey, D., Pei, D., Wu, Y., Zhang, B., and L.
              Zhang, "PHAS: A Prefix Hijack Alert System", USENIX
              Security .

   [AddrAlloc]
              Meng, X., Xu, Z., Zhang, B., Huston, G., Lu, S., and L.
              Zhang, "IPv4 Address Allocation and BGP Routing Table
              Evolution", ACM SIGCOMM Computer Communication Review
              (CCR) special issue on Internet Vital Statistics, Volume
              35, Issue 1, p71-80.

   [RAWS]     Meyer, D., Zhang, L., and K. Fall, "Report from the IAB
              Workshop on Routing and Addressing", Internet Draft, http:
              //www.ietf.org/internet-drafts/
              draft-iab-raws-report-02.txt, 2007.


Jen, et al.              Expires January 3, 2008               [Page 17]

Internet-Draft               Transit Mapping                   July 2007


   [CRIO]     Zhang, X., Francis, P., Wang, J., and K. Yoshida, "Scaling
              IP Routing with the Core Router-Integrated Overlay", Proc.
              International Conference on Network Protocols , 11 2005.


Appendix A.  BGP Mapping Announcement Fields

   Address Type - This field specifies the type of user-space addresses
   used for the user-space prefixes in the announcement.  All user-space
   prefixes in a single mapping announcement MUST be of the same address
   type.  Currently, this is expected to be either IPv4 or IPv6, but any
   other address type is possible provided that it is supported by the
   APT nodes in the ASes that wish to use it.  APT nodes MUST ignore
   mapping announcements for address types that they do not understand.

   Total Length - This field specifies the total number of bytes used by
   all mappings in the announcement.  Each mapping announcement can
   contain mappings for multiple prefixes, each with multiple mapping
   entries.

   Each mapping in the announcement is described by the following
   fields:

   User-space Prefix - This is the user prefix for the mapping.

   Public Key - This is a public key that can be used to verify
   signatures, decrypt data, and prevent misconfigurations for the
   corresponding user-space prefix.  See Section Section 7.1 for more
   information.

   Time To Live (TTL) - This is the amount of time in hours that this
   mapping should persist in default mappers before being considered
   obsolete and erased.  This value MUST be set to at least three times
   the regular refresh interval lest the corresponding user-space prefix
   become unreachable.  The TTL is specified in hours to prevent
   misconfigurations from causing excessive mapping updates.

   TS Address Count - This is the total number of TS addresses that the
   corresponding user-space prefix maps to.

   TS Address Set - This is a set of TS addresses, each with a priority.
   The total number of addresses is specified by the previous field.
   Priorities are arbitrary integers that only have meaning in reference
   to each other.  Addresses with lower priority values are considered
   more preferable.


Jen, et al.              Expires January 3, 2008               [Page 18]

Internet-Draft               Transit Mapping                   July 2007


Appendix B.  ICMP Mapping Message Fields

   User-space Prefix - This prefix is used to match the input address
   for mapping cache lookups.

   TS Address - This is the destination that the user-space prefix maps
   to.

   TTL - This is the time that the entry stays in the cache.  Its value
   is determined by the default mapper.


Appendix C.  ICMP Border Link Failure Fields

   Prefix - This field contains the user-space prefix that cannot be
   reached as a result of the border link failure.

   Signature - This field can optionally contain a signature generated
   using the UN's private key.  It can then be used to verify the
   legitimacy of the message.


Appendix D.  Hidden Backup Mappings

   As mentioned in our mapping section, our design allows users to
   assign backup providers and perform traffic engineering through
   appropriate assignment of their TN priority values.  Of course, this
   method will only prove effective if all transit networks generally
   respect these priority values.  This may not be the case in practice.

   User networks may be negatively affected if priorities are not
   respected.  For example, imagine that a user has a cheap primary
   provider and an expensive backup provider.  If enough transit
   networks ignore the UN's preference and send his traffic through the
   backup provider, the financial impact on the user could be
   significant.  For this reason, users may not want to depend on other
   ASes to respect their priority values.

   In today's Internet, multihomed user networks can use BGP trickery to
   hide their backup providers unless they are needed.  The backup
   provider simply does not announce a route to the UN's prefix unless
   it receives a withdrawal for that prefix from the UN's primary
   provider.  At this point, the backup provider will announce its path
   to the UN's prefix.  Once it receives a new announcement for the
   prefix from the primary provider, the backup provider withdraws its
   path to the UN's prefix, putting it back into hiding.

   In accordance with our "do no harm" design philosophy, we present a


Jen, et al.              Expires January 3, 2008               [Page 19]

Internet-Draft               Transit Mapping                   July 2007


   method for including a hidden backup feature into APT.  Hidden backup
   support introduces new ICMP packets, mapping tables, and state into
   APT.  We leave it as an open question whether this feature should be
   included at all.  If transit networks are willing to respect the
   priority values included with mapping entries, hidden backup support
   (and its complexity) can be omitted entirely from APT.

Appendix D.1.  Hidden Backup Mapping Protocol

   A user would want to activate his hidden backup provider in the same
   three failure situations that require switching to an alternate
   provider (see Section Section 5.1).  We will explain how to handle
   each of these failure types.

   Situation (1) is detectable by the backup provider via BGP.  When the
   backup provider learns that there are no routes to the UN's primary
   provider, he MUST announce his own backup mapping and begin servicing
   the user network.  If the UN's primary provider later becomes
   reachable, the backup provider MUST re-announce the original mapping.
   The responsibility to re-announce the original mapping lies with the
   backup provider in order to prevent route flapping from causing
   mapping flapping.  The backup provider SHOULD wait until the primary
   provider has been stable for a set period of time before re-
   announcing the original mapping.  Also note that these mapping
   announcements are indistinguishable from those generated by permanent
   mapping changes, leaving default mappers throughout transit space no
   choice but to respect them.

   Situation (2) is detectable by the primary provider via IGP.  When
   the primary provider learns that one of his TRs is down, he MUST
   inform the backup providers for the affected user networks.  This
   could be done via BGP flooding, but it seems excessive to flood the
   entire core with a message that is only relevant to a handful of
   providers.  Instead of flooding, the primary provider needs to inform
   the relevant backup providers directly.  To support this, primary
   providers MUST store a "backup-mapping table" that maps each of their
   customers to their corresponding backup providers.  This table should
   not be very large, since each provider will only store entries for
   his own customers.  Furthermore, customers who do not have a hidden
   backup can be excluded from the backup-mapping table.

   When a TR goes down, one of the provider's default mappers can use
   its reverse-mapping table (see Section Section 5.1) to determine
   which user prefixes are affected.  It can then use its backup-mapping
   table to determine which backup providers need to be notified.  The
   rest of the communication will be implemented using two new ICMP
   message types, "Primary Provider Failure" and "Primary Provider
   Recovery".  Each of these types will require an acknowledgment (ACK)


Jen, et al.              Expires January 3, 2008               [Page 20]

Internet-Draft               Transit Mapping                   July 2007


   flag to ensure delivery.

   Primary providers MUST send an ICMP "Primary Provider Failure"
   message to each of the appropriate backup providers.  These messages
   MUST contain the relevant mapping entry.  Upon receipt of such a
   message, a backup provider MUST respond with an identical packet,
   except that it MUST set the ACK flag.  Then, it MUST announce a
   backup mapping entry.  When the customer's primary provider detects a
   recovery, it MUST send an ICMP "Primary Provider Recovery" message to
   the appropriate backup providers.  The backup providers MUST
   acknowledge the message, and re-announce the original mapping.  As in
   situation (1), re-announcing of the original mapping is left to the
   backup providers to prevent mapping flapping.

   Situation (3) is detectable by the TR whose link to a user has gone
   down.  The TR MUST inform his default mapper of this failure via the
   new ICMP type described in Section Section 5.1.3.  At this point, the
   primary provider can lookup the affected user in his backup-mapping
   table, and proceed as in situation (2).

   The ICMP communication described above is essential to hidden backup
   functionality.  Thus, these messages must be secure and reliable.
   Security can be achieved with public-private key cryptography (see
   Section Section 7.3).  For reliability, the primary provider MUST
   continue to send "Primary Provider Failure" and "Primary Provider
   Recovery" ICMP packets periodically until it receives an
   acknowledgment from the backup provider.  Backup providers MUST
   always acknowledge these types of ICMP messages, regardless of the
   state of the corresponding mapping.

   Mapping announcements and ICMP communication will be carried out by
   default mappers unless otherwise specified.  Backup-mapping tables
   are also stored in the default mappers.


Authors' Addresses

   Dan Jen

   Email: jenster@cs.ucla.edu


   Michael Meisel

   Email: meisel@cs.ucla.edu


Jen, et al.              Expires January 3, 2008               [Page 21]

Internet-Draft               Transit Mapping                   July 2007


   Dan Massey

   Email: massey@cs.colostate.edu


   Lan Wang

   Email: lanwang@memphis.edu


   Beichuan Zhang

   Email: bzhang@cs.arizona.edu


   Lixia Zhang

   Email: lixia@cs.ucla.edu


Jen, et al.              Expires January 3, 2008               [Page 22]

Internet-Draft               Transit Mapping                   July 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).


Jen, et al.              Expires January 3, 2008               [Page 23]