TOC 
Internet Research Task ForceT. Li, Ed.
Internet-DraftRedback Networks, Inc.
Intended status: InformationalFebruary 15, 2009
Expires: August 19, 2009 


Preliminary Recommendation for a Routing Architecture
draft-irtf-rrg-recommendation-00

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on August 19, 2009.

Copyright Notice

Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

Abstract

It is commonly recognized that the Internet routing and addressing architecture is facing challenges in scalability, multi-homing, and inter-domain traffic engineering. This document is the Routing Research Group's prelimnary recommendation for a new routing architecture.

This document is a work in progress.



Table of Contents

1.  Introduction
    1.1.  Structure of This Document
2.  Terminology and Abbreviations
3.  Taxonomies of the Solution Space
    3.1.  A Mechanism Taxonomy
        3.1.1.  Transport
        3.1.2.  Translation
        3.1.3.  Map & Encap
    3.2.  A Functional Taxonomy
        3.2.1.  Separation
        3.2.2.  Elimination
    3.3.  The Herrin Taxonomy
        3.3.1.  Strategy A
            3.3.1.1.  Variants
            3.3.1.2.  Mapping approaches
            3.3.1.3.  Failure handling approaches
            3.3.1.4.  Compatibility approaches
            3.3.1.5.  Core routing methods
            3.3.1.6.  Major criticisms
        3.3.2.  Strategy B
            3.3.2.1.  Locator variants
            3.3.2.2.  Identifier variants
            3.3.2.3.  Major criticisms
        3.3.3.  Strategy C
            3.3.3.1.  Variants
            3.3.3.2.  Major criticisms
        3.3.4.  Strategy D
            3.3.4.1.  Variants
            3.3.4.2.  Major criticisms
        3.3.5.  Strategy E
            3.3.5.1.  Variants
            3.3.5.2.  Major criticisms
        3.3.6.  Strategy F
            3.3.6.1.  Major criticisms
        3.3.7.  Strategy G
            3.3.7.1.  Major criticisms
4.  Recommendations
    4.1.  No manual renumbering of end hosts
    4.2.  Future progress
5.  Acknowledgements
6.  IANA Considerations
7.  Security Considerations
8.  References
    8.1.  Normative References
    8.2.  Informative References
§  Author's Address




 TOC 

1.  Introduction

It is commonly recognized that the Internet routing and addressing architecture is facing challenges in scalability, multi-homing, and inter-domain traffic engineering. The problem being addressed has been documented in [I‑D.narten‑radir‑problem‑statement] (Narten, T., “On the Scalability of Internet Routing,” February 2010.), and the design goals that we have agreed to can be found in [I‑D.irtf‑rrg‑design‑goals] (Li, T., “Design Goals for Scalable Internet Routing,” July 2007.). This document is the Routing Research Group's (RRG's) preliminary recommendation for a new routing architecture.

This document is a work in progress.



 TOC 

1.1.  Structure of This Document

This document describes many of the different possible approaches that could be taken in a new routing architecture, as well as a summary of the current thinking of the overall group regarding each approach.



 TOC 

2.  Terminology and Abbreviations

This section describes the common terminology used in this document. Particular architectures and discussions frequently define additional terms, qualify these terms or add additional semantics.

FIB
Forwarding Information Base, also known as the forwarding table. Typically, the forwarding table contains the subset of the information in the RIB that is actually needed at forwarding time.
GUID
Globally Unique IDentifier
ISP
Internet Service Provider
identifier
An identifier is the name of an endpoint. It has no topological significance. Identifiers may have other properties, such as the scope of their uniqueness (global or local) and the probability of their uniqueness (absolute or statistical).
locator
A locator is a name that has topological significance.
RIB
Routing Information Base, also known as the routing table.
RIR
Regional Internet Registry
RLOC
A Remote LOCator is a locator with global scope.
SID
Session IDentifier
TE
Traffic Engineering is a technique for controlling the path that traffic takes beyond baseline methods.



 TOC 

3.  Taxonomies of the Solution Space

In trying to understand the entirety of the solution space that we are confronted with, we have made multiple attempts to divide the space into comprehensible sectors. The entire solution space is complex, and it seems difficult to capture all of the pertinent dimensions of the space with only a single perspective. Different taxonomies seem to provide insight during different discussions, and we summarize all of them here to capture all of the useful perspectives.



 TOC 

3.1.  A Mechanism Taxonomy

In this taxonomy, solutions are grouped by the primary mechanisms that they use to achieve their goals.



 TOC 

3.1.1.  Transport

Transport solutions are characterized by their usage of modifications soley at the transport layer to provide locator and identifier independence. For example, if a transport protocol supports connections across multiple addresses as a means of supporting multi-homed hosts, and can seamlessly and transparently shift across these addresses, then it can provide the multi-homing support that is required.

However, in our discussions, it became clear that even with transport level agility, host-level renumbering of sites would still be necessary to support these types of solutions. The consensus of the group is that such site renumbering is a completely unacceptable requirement and as such, these types of solutions are not of interest for further exploration.



 TOC 

3.1.2.  Translation

Translation solutions are characterized by a translation operation between an identifier to a locator and back to an identifier as the packet traverses the network. Translation approaches do not add additional encapsulations to the packet as they traverse the network, usually translating the fields in their place in the packet. Translation solutions can further be categorized as those with separated fields for locators and identifiers and those that continue to use a single address field. Translation solutions also can be categorized as having the translation done in the host or in a middle box.



 TOC 

3.1.3.  Map & Encap

Map & Encap solutions are characterized by a lookup operation from the identifier to a locator and then an encapsulation of the packet payload into a tunnel that directs the packet across the topology.



 TOC 

3.2.  A Functional Taxonomy

EDITOR'S NOTE: Lixia to propose text here.



 TOC 

3.2.1.  Separation



 TOC 

3.2.2.  Elimination



 TOC 

3.3.  The Herrin Taxonomy

As part of the mailing list discussion, the group constructed a more detailed taxonomy of possible architectures, described as a series of strategies.



 TOC 

3.3.1.  Strategy A

Local routing is based on an address, which functions as a GUID, SID component and local locator, but have each packet flow through an encoder which attaches a RLOC before the packet enters the internetwork core. Routing within the core is based on the RLOC. Only ISPs with significant interconnection have their own RLOCs. Fewer than 10,000 such "core ISPs" exist today and the number is growing much more slowly than the routing table overall. Once the packet reaches the network identified by the RLOC, local routing by address takes over for final delivery. Distribute RLOCs through the core via a typical distance-vector or link-state routing protocol.



 TOC 

3.3.1.1.  Variants

A1a
Each core ISP has one RLOC. The RLOC's existence and reachability is flooded to the rest of the core.
A1b
Each core ISP has a small number of RLOCs for TE. The RLOCs' existence and reachability is flooded to the rest of the core.
A1c
Each core ISP has an aggregated set of RLOCs which it may hierarchically assign to customers downstream and/or disaggregate for TE. The aggregated RLOC's existence and reachability is flooded to the rest of the core.



 TOC 

3.3.1.2.  Mapping approaches

A2a
Addresses are statically mapped to RLOCs. Map entries are periodically pushed towards a central or distributed registry. The full list is periodically downloaded to the encoders which add RLOCs to the packets.
A2b
Addresses are dynamically mapped to RLOCs. Map entries are pushed towards a central or distributed registry as they change. The registry pushes all incremental changes in near-real time to all encoders which add RLOCs to the packets.
A2c
Addresses are dynamically mapped to RLOCs. Map entries are pushed towards a central or distributed registry as they change. Encoders request and briefly cache individual mappings from the registry as needed.



 TOC 

3.3.1.3.  Failure handling approaches

Link failures in the Internet core cause the RLOCs to be rerouted with no change to the address to RLOC mapping.

A3a
RLOC encoders detect when particular RLOCs are no longer reachable at all and fall back on secondary RLOCs for a particular address. Encoders rely on active failure messages from some system in the RLOC-specified network to indicate that a host is no longer available via that RLOC, causing them to fall back on secondary RLOCs for that host.
A3b
Link failures which prevent parts of the RLOC's network from reaching a destination host or set of hosts it serves cause an external analysis element to make a dynamic change to the address-RLOC map, depreferencing or removing the affected RLOC. The external analysis element may be under the control of the end-user destination network, the RLOC network or a third party under contract to one of them.



 TOC 

3.3.1.4.  Compatibility approaches

A4a
Create a new IP protocol. The new protocol would not be compatible with IPv4 and IPv6.
A4b
Modify the IP protocol. The modified protocol would not be compatible with IPv4 and IPv6 as deployed.
A4c
Standard IPv4 and IPv6 packets are tunnelled while they transit the Internet core. Path-MTU issues are handled by setting an Internet-wide maximum packet size enforced by the encoders and assuring that all core links support that size.
A4d
Standard IPv4 and IPv6 packets are tunnelled while they transit the Internet core. Path-MTU issues are handled by returning packets which breach the MTU while in the core back to the encoder who must act as a proxy by returning a sensible packet-too-big message to the originating host.
A4e
The IPv6 address space is partitioned into end-user address space and Internet core address space. The address to RLOC map is symmetric. Part of the IPv6 end-user address is swapped for the RLOC when the packet enters the Internet core and then restored when it leaves the Internet core. Use a different A4 variant for IPv4.
A4f
The IPv6 flow label or some other component(s) of the IPv6 header are used to contain the RLOC. The flow label is set before the packet enters the core. Non-local packets are routed based on the flow label. Use a different A4 variant for IPv4.
A4g
Steal bits from other functions in the IPv4 header (e.g. checksum) to make space for an RLOC. Discard those components and set the RLOC when the packet enters the core. Restore the original bits when the packet leaves the core. Use a different A4 variant for IPv6.



 TOC 

3.3.1.5.  Core routing methods

A5a
Distribute RLOCs through the Internet core via BGP.
A5b
Distribute RLOCs through the Internet core via a new distance-vector protocol.
A5c
Distribute RLOCs through the Internet core via a link-state protocol.



 TOC 

3.3.1.6.  Major criticisms

There don't appear to be any genuinely clean ways of implementing strategy A. Handling path-MTU is a usually problem since the packets in the core are different than the origin host would recognize. Extra bandwidth is consumed by the ingress tunnel router figuring out whether the egress tunnel router is still available and functioning. Border filtering of source addresses becomes problematic.

Deployment may require heavy weight "for the public good" relays in the non-upgraded part of the Internet to facilitate migration.

During the transition period, it appears difficult to remove legacy prefixes from the global routing table. The best that can be done is to advertise aggregates of legacy prefixes from the relays. This may have an impact on stretch.



 TOC 

3.3.2.  Strategy B

Assign hierarchically aggregatable locators to every host. Assign multiple locators to each host such that in the network topology hosts appear as stubs in multiple locations instead of forming distant connections in the graph. Assign one aggregated set of locators to each core ISP where a core ISP is one which has at least half a dozen major transit or peering links. Flood the aggregated locator's existence and reachability to the rest of the core.

Having reduced the network topology to something relatively close to a hierarchy, perform plain old hierarchical aggregation on the locators. Add and remove locators to each host dynamically during operation as needed to reflect changes in the nearby network hierarchies.

Attach source and destination locators when the packet leaves the host. Route by first source then destination locator: move up the source network hierarchy until you can move laterally toward the destination locator in a permissioned manner.

Identifier to locator maps are pushed from the host towards a distributed registry as they change. Hosts request and temporarily cache individual mappings from the registry as needed.



 TOC 

3.3.2.1.  Locator variants

B1a
A hierarchically aggregated numeric locator is dynamically assigned to each host from each upstream path. Each router receives a less specific prefix from upstream and assigns a more specific prefix downstream. Link state changes in the path to the core are satisfied by renumbering instead of rerouting: the host abandons the locator hierarchically associated with the old path. If a new path is available, the host acquires a locator hierarchically associated with the new path.
B1b
A locator is an administratively-assigned loose source route instead of a single address. The first address in the loose source route is a universally-known waypoint router. The last address is the final destination. Link state changes in the path to the core are satisfied by rerouting in the appropriate routing domain when possible. If rerouting in the affected domain is not possible, the host abandons the impacted locator.
B1c
Semi-hierarchical numeric locators are administratively assigned. Local reconnection during link state changes is accomplished with rerouting instead of renumbering.



 TOC 

3.3.2.2.  Identifier variants

B2a
Each host has a single numeric identifer to which the locators are attached. This identifier is used by the layer-4/5 and higher protocols to compose the SID.
B2b
Each service provided by a host has a globally unique, hierarchical character-string identifier to which the locators are attached. Clients initiating communication with that service negotiate a numeric SID which is unique only within the scope of that service.



 TOC 

3.3.2.3.  Major criticisms

  1. This strategy is probably not compatible with UDP or TCP though B1a/c could be compatible with IPv6's layer 3. The replacement layer-4/5 protocols should also be coaxable to run on top of IPv4's layer 3 in the not-yet-upgraded part of the network.
  2. How do firewalls work if the locators are constantly in flux in B1a?
  3. How is theft of service avoided in B1b?



 TOC 

3.3.3.  Strategy C

Suppress distant routes by aggregating them into sets expected to be available in a given direction. Because locator reachability info is not flooded, the routing tables each router must deal with are relatively small.



 TOC 

3.3.3.1.  Variants

C1
Aggregate locators based on geography. All nodes within some geographic boundary are assigned the same locator. Routers move packets to any adjacent router deemed to be "closer" to the locator in question.



 TOC 

3.3.3.2.  Major criticisms

No one has been able to construct a proposal under strategy C without introducing constraints that are fundamentally incompatible with the Internet's economic model. For example, geographic aggregation has been shown to have uncorrectable theft-of-service anomalies in networks as small as 8 autonomous systems and two geographic areas.

Fundamentally, geographic aggregation requires that there be a per-region interconnect that functions as the deaggregation point for the region's traffic. Funding such an interconnect and compelling the affected ISPs to participate in the interconnect requires external third party coercive controls.



 TOC 

3.3.4.  Strategy D

Use plain old BGP for the RIB. Algorithmically compress the FIB in each router.



 TOC 

3.3.4.1.  Variants

D1a
Aggregate any adjacent routes that have the same next hop.
D1b
Insert a /0 route into the FIB which goes to the most popular next hop for all the routes in the RIB. Step to the /1 level. For each /1, if most of the routes in the RIB within that /1 go to a different next hop than the longest route above (the /0 route), add that /1 route to the FIB. Step to the /2 level. Repeat until all routes in the RIB go to the correct next hop in the FIB. Unrouted space is treated as "don't care": it will route wherever the algorithm happens to drop it and will rely on the TTL to take packets off the network.



 TOC 

3.3.4.2.  Major criticisms

  1. The RIB can grow to up to an order of magnitude larger than the FIB before it hits the wall too. One order of magnitude doesn't gain us multihoming for small office/home office sites.
  2. FIBs towards the edge should aggregate well with this strategy but there's no evidence to support a conclusion that they'd aggregate well deep in the core.



 TOC 

3.3.5.  Strategy E

Make no routing architecture changes. Instead, create a billing system through which the ISPs running core routers are paid by the ISPs announcing prefixes. Let economics suppress growth to a survivable level.



 TOC 

3.3.5.1.  Variants

E1a
Everybody pays the RIRs. the RIRs pay the router operators.
E1b
Private negotiation between parties.
E1c
Assisted private negotiation where router operators can offer standardized contracts to carry prefixes and prefix announcers can accept groups of identical contracts via an automated third-party payment system moving funds between the two easily.



 TOC 

3.3.5.2.  Major criticisms

  1. If it could be done without creating massive boondoggle, why hasn't it been done already? This has been discussed previously and there are no obvious mechanisms to put such a system in place without having a central authority for the Internet.
  2. This means giving up on a solution that genuinely enables users and accepting one that merely keeps the Internet viable.



 TOC 

3.3.6.  Strategy F

Do nothing. (See [RFC1887] (Rekhter, Y. and T. Li, “An Architecture for IPv6 Unicast Address Allocation,” December 1995.) Section 4.4.1)



 TOC 

3.3.6.1.  Major criticisms

It costs "everybody else" a grand total of at least $6000 per year for each prefix you announce. [BGPCost] (Herrin, W., “What does a BGP Route cost?,” .) When we give away that $6000 of value for free, it inevitably creates a "tragedy of the commons" problem.

Given that the research group is chartered to 'do something', this alternative does not fit within the charter.



 TOC 

3.3.7.  Strategy G

Change the topology so that all hosts attach to only one ISP using IPv6 and the ISP's single set of provider assigned addresses. (Actual result of [RFC1887] (Rekhter, Y. and T. Li, “An Architecture for IPv6 Unicast Address Allocation,” December 1995.) Section 4.4.3)



 TOC 

3.3.7.1.  Major criticisms

This strategy wasn't accepted by the operations community because the IPv6 architecture makes renumbering every bit as hard as in IPv4 and the multihoming described in [RFC1887] (Rekhter, Y. and T. Li, “An Architecture for IPv6 Unicast Address Allocation,” December 1995.) Section 4.4.3 does not appear to actually work.



 TOC 

4.  Recommendations



 TOC 

4.1.  No manual renumbering of end hosts

There is clear consensus in the group that renumbering of sites must not require manual intervention on a per-host basis. This does not scale adequately from a management cost structure. This effectively eliminates solutions that require that hosts have only a single locator and renumber on topological changes, or if hosts maintain multiple locators manually.

This implies that transport solutions (Transport) are unacceptable unless coupled with another mechanism that would automate the distribution and management of host renumbering, which appears to be a major undertaking all on its own. Further, variants of Strategy B (Strategy B) that require manual locator assignment are similarly unacceptable, as are solutions that do not significantly change existing host behavior, such as Strategy D (Strategy D), Strategy E (Strategy E), Strategy F (Strategy F), and Strategy G (Strategy G).



 TOC 

4.2.  Future progress

The RRG should continue to prune the solution space presented here, attempting to find the overall maximally acceptable solution within the bounds and constraints that have been presented. Whenever possible the research group will continue to discuss architectural concepts and make architectural recommendations rather than becoming embroiled in detailed engineering implementation discussions.

The RRG should present a final recommendation by March, 2010.



 TOC 

5.  Acknowledgements

This document represents a small portion of the overall work product of the Routing Research Group, who have developed all of these architectural approaches and many specific proposals within this solution space.

In particular, Bill Herrin has been instrumental in constructing his taxonomy (The Herrin Taxonomy), with the input of the entire community. This has been pivotal in helping to focus the discussions of the group. We would also like to thank Joel Halpern for his insights and comments.



 TOC 

6.  IANA Considerations

This memo includes no requests to IANA.



 TOC 

7.  Security Considerations

All solutions are required to provide security that is at least as strong as the existing Internet routing and addressing architecture.



 TOC 

8.  References



 TOC 

8.1. Normative References

[I-D.irtf-rrg-design-goals] Li, T., “Design Goals for Scalable Internet Routing,” draft-irtf-rrg-design-goals-01 (work in progress), July 2007 (TXT).
[I-D.narten-radir-problem-statement] Narten, T., “On the Scalability of Internet Routing,” draft-narten-radir-problem-statement-05 (work in progress), February 2010 (TXT).
[RFC1887] Rekhter, Y. and T. Li, “An Architecture for IPv6 Unicast Address Allocation,” RFC 1887, December 1995 (TXT).


 TOC 

8.2. Informative References

[BGPCost] Herrin, W., “What does a BGP Route cost?.”


 TOC 

Author's Address

  Tony Li (editor)
  Redback Networks, Inc.
  300 Holger Way
  San Jose, CA 95134
  USA
Phone:  +1 408 750 5160
Email:  tony.li@tony.li