Network Working Group A. Doria Internet-Draft LTU Expires: April 27, 2006 E. Davies Consultant October 24, 2005 Analysis of IDR requirements and History draft-irtf-routing-history-02.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 27, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document analyses the current state of IDR routing with respect to RFC1026 and other IDR requirements and design efforts. It is the companion document to "Requirements for Inter-Domain Routing" [I-D.irtf-routing-reqs], which is a discussion of requirements for the future routing architecture and future routing protocols. Doria & Davies Expires April 27, 2006 [Page 1] Internet-Draft IDR History October 2005 Table of Contents 1. Provenance of this Document . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Background . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Historical Perspective . . . . . . . . . . . . . . . . . . . . 5 3.1. The Legacy of RFC1126 . . . . . . . . . . . . . . . . . . 5 3.1.1. "General Requirements" . . . . . . . . . . . . . . . . 5 3.1.2. "Functional Requirements" . . . . . . . . . . . . . . 9 3.1.3. "Non-Goals" . . . . . . . . . . . . . . . . . . . . . 16 3.2. ISO OSI IDRP, BGP and the Development of Policy Routing . 19 3.3. Nimrod Requirements . . . . . . . . . . . . . . . . . . . 24 3.4. PNNI . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5. Recent Research Work . . . . . . . . . . . . . . . . . . . 26 3.5.1. Developments in Internet Connectivity . . . . . . . . 26 3.5.2. Defending the End To End Principle . . . . . . . . . . 27 4. Existing problems of BGP and the current EGP/IGP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1. BGP and Auto-aggregation . . . . . . . . . . . . . . . . . 29 4.2. Convergence and Recovery Issues . . . . . . . . . . . . . 29 4.3. Non-locality of Effects of Instability and Misconfiguration . . . . . . . . . . . . . . . . . . . . . 30 4.4. Multihoming Issues . . . . . . . . . . . . . . . . . . . . 30 4.5. AS-number exhaustion . . . . . . . . . . . . . . . . . . . 31 4.6. Partitioned AS's . . . . . . . . . . . . . . . . . . . . . 32 4.7. Load Sharing . . . . . . . . . . . . . . . . . . . . . . . 32 4.8. Hold down issues . . . . . . . . . . . . . . . . . . . . . 32 4.9. Interaction between Inter domain routing and intra domain routing . . . . . . . . . . . . . . . . . . . . . . 33 4.10. Policy Issues . . . . . . . . . . . . . . . . . . . . . . 34 4.11. Security Issues . . . . . . . . . . . . . . . . . . . . . 34 4.12. Support of MPLS and VPNS . . . . . . . . . . . . . . . . . 35 4.13. IPv4 / IPv6 Ships in the Night . . . . . . . . . . . . . . 35 4.14. Existing Tools to Support Effective Deployment of Inter-Domain Routing . . . . . . . . . . . . . . . . . . . 35 4.14.1. Routing Policy Specification Language RPSL (RFC 2622, 2650) and RIPE NCC Database (RIPE 157) . . . . . 36 5. Security Considerations . . . . . . . . . . . . . . . . . . . 37 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 37 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42 Intellectual Property and Copyright Statements . . . . . . . . . . 43 Doria & Davies Expires April 27, 2006 [Page 2] Internet-Draft IDR History October 2005 1. Provenance of this Document In 2001, the IRTF Routing Research Group (IRTF RRG) chairs, Abha Ahuja and Sean Doran, decided to establish a sub-group to look at requirements for inter-domain routing (IDR). A group of well known routing experts was assembled to develop requirements for a new routing architecture. Their mandate was to approach the problem starting from a blank sheet. This group was free to take any approach, including a revolutionary approach, in developing requirements for solving the problems they saw in inter-domain routing. Simultaneously, an independent effort was started in Sweden with a similar goal. A team, calling itself Babylon, representing vendors, service providers, and academia, assembled to understand the history of inter-domain routing, to research the problems seen by the service providers, and to develop a proposal of requirements for a follow-on to the current routing architecture. This group's approach required an evolutionary approach starting from current routing architecture and practice. In other words the group limited itself to developing an evolutionary strategy. The Babylon group was later folded into the IRTF RRG as Sub-Group B. This document, which was a part of Sub-group B's output, provides a snapshot of the current state of Inter-Domain Routing (IDR) at the time of original writing (2001) with some minor updates to take into account developments since that date, bringing it up to date in 2005. The development of the new requirments set is then motivated by an analysis of the problems that IDR has been encountering in the recent past. This document is intended as a counterpart to the Routing Requirements document which captures the requirements for future domain routing systems as captured separately by the IRTF RRG Sub- groups A and B [I-D.irtf-routing-reqs]. 2. Introduction It is generally accepted that there are major shortcomings in the inter-domain routing of the Internet today and that these may result in severe routing problems within an unspecified period of time. Remedying these shortcomings will require extensive research to tie down the exact failure modes that lead to these shortcomings and identify the best techniques to remedy the situation. Changes in the nature and quality of the services that users want from the Internet are difficult to provide within the current framework, as they impose requirements never foreseen by the original architects of the Internet routing system. Doria & Davies Expires April 27, 2006 [Page 3] Internet-Draft IDR History October 2005 The kind of radical changes that have to accommodated are epitomized by the advent of IPv6 and the application of IP mechanisms to private commercial networks that offer specific service guarantees beyond the best-effort services of the public Internet. Major changes to the inter-domain routing system are inevitable to provide an efficient underpinning for the radically changed and increasingly commercially- based networks that rely on the IP protocol suite. Current practice stresses the need to separate the concerns of the control plane in a router and the forwarding plane: This document will follow this practice, but we still use the term 'routing' as a global portmanteau to cover all aspects of the system. This document provides a historical perspective on the current state of domain routing in Section 3 by revisiting the previous IETF requirements document intended to steer the development of a future routing system. These requirements, which led to the design of the Border Gateway Protocol (BGP) in 1989, are contained in RFC1126 - "Goals and Functional Requirements for Inter-Autonomous System Routing" [RFC1126]. Section 3 also looks at some other work on requirements for domain routing that was carried out before and after RFC1126 was published. This work fleshes out the historical perspective and provides some additional insights into alternative approaches which may be instructive when building a new set of requirements. The motivation for change and the inspiration for some of the requirements for new routing architectures derive from the problems attributable to the current domain routing system that are being experienced in the Internet today. These will be discussed in Section 4. 2.1. Background Today's Internet uses an addressing and routing structure that has developed in an ad hoc, more or less upwards-compatible fashion. It has progressed from handling a non-commercial Internet with a single administrative domain to a solution that is just about controlling today's multi-domain, federated Internet, carrying traffic between the networks of commercial, governmental and not-for-profit participants. As well as directing traffic to its intended end- point, inter-domain routing mechanisms are expected to implement a host of domain specific routing policies for competing, communicating domains. The result is not ideal, particularly as regards inter- domain routing mechanisms, but it does a pretty fair job at its primary goal of providing any-to-any connectivity to many millions of computers. Doria & Davies Expires April 27, 2006 [Page 4] Internet-Draft IDR History October 2005 Based on a large body of anecdotal evidence, but also on a growing body of experimental evidence [Labovitz02] and analytic work on the stability of BGP under certain policy specifications [Griffin99], the main Internet inter-domain routing protocol, BGP version 4 (BGP-4), appears to have a number of problems that need to be resolved. Additionally, the hierarchical nature of the inter-domain routing problem appears to be changing as the connectivity between domains becomes increasingly meshed [RFC3221] which alters some of the scaling and structuring assumptions on which BGP-4 is built. Patches and fix-ups may relieve some of these problems but others may require a new architecture and new protocols. 3. Historical Perspective 3.1. The Legacy of RFC1126 RFC1126 [RFC1126] outlined a set of requirements that were to guide the development of BGP. While the network is demonstrably different from what it was in 1989, both as to structure and size, many of the same requirements remain. As a first step in setting requirements for the future, we need to understand the requirements that were originally set for the current protocols. And in charting a future architecture we must first be sure to do no harm. This means a future domain routing system has to support as its base requirement, the level of function that is available today. The following sections each relate to a requirement, or non- requirement listed in RFC1126. In fact the section names are direct quotes from the document. The discussion of these requirements covers the following areas: Explanation: Optional interpretation for today's audience of the original intent of the requirement Relevance: Is the requirement of RFC1126 still relevant, and to what degree? Should it be understood differently in today's environment? Current practice: How well is the requirement met by current protocols and practice? 3.1.1. "General Requirements" 3.1.1.1. "Route to Destination" Timely routing to all reachable destinations, including multihoming and multicast. Doria & Davies Expires April 27, 2006 [Page 5] Internet-Draft IDR History October 2005 Relevance: Valid, but requirements for multihoming need further discussion and elucidation. The requirement should include multiple source multicast routing. Current practice: Multihoming is not efficient and the proposed inter-domain multicast protocol BGMP [RFC3913] is an add-on to BGP following many of the same strategies but not integrated into the BGP framework . 3.1.1.2. "Routing is Assured" This requires that a user be notified within a reasonable time period of attempts, about inability to provide a service. Relevance: Valid Current practice: There are ICMP messages for this, but in many cases they are not used, either because of fears about creating message storms or uncertainty about whether the end system can do anything useful with the resulting information. IPv6 implementations may be able to make better use of the information as they may have alternative addresses that might provide 3.1.1.3. "Large System" The architecture was designed to accommodate the growth of the Internet. Relevance: Valid. Properties of Internet topology might be an issue for future scalability (topology varies from very sparse to quite dense at present). Instead of setting growth in a time-scale, indefinite growth should be accommodated. On the other hand, such growth has to be accommodated without making the protocols too expensive - trade-offs may be necessary. Current practice: Scalability of the current protocols will not be sufficient under the current rate of growth. There are problems with BGP convergence for large dense topologies, problems with routing information propagation between routers in transit domain, limited support for hierarchy, etc. Doria & Davies Expires April 27, 2006 [Page 6] Internet-Draft IDR History October 2005 3.1.1.4. "Autonomous Operation" This requirement encapsulates the need for administative domains ("Autonomous Systems" - AS) to be able to operate autonomously as regards setting routing policy: Relevance: Valid. There may need to be additional requirements for adjusting policy decisions to the global functionality and for avoiding contradictory policies. This would decrease the possibility of unstable routing behavior. There is a need for handling various degrees of trust in autonomous operations, ranging from no trust (e.g., between separate ISPs) to very high trust where the domains have a common goal of optimizing their mutual policies. Policies for intra domain operations should in some cases be revealed, using suitable abstractions. Current practice: Policy management is in the control of network managers, as required, but there is little support for handling policies at an abstract level for a domain. Cooperating administrative entities decide about the extent of cooperation independently. Lack of coordination combined with global range of effects results in occasional melt-down of Internet routing. 3.1.1.5. "Distributed System" The routing environment is a distributed system. The distributed routing environment supports redundancy and diversity of nodes and links. Both data and operations are distributed. Relevance: Valid. RFC1126 is very clear that we should not be using centralized solutions, but maybe we need a discussion on trade-offs between common knowledge and distribution (i.e., to allow for uniform policy routing, e.g., GSM systems are in a sense centralized, but with hierarchies) Doria & Davies Expires April 27, 2006 [Page 7] Internet-Draft IDR History October 2005 Current practice: Routing is very distributed, but lacking abilities to consider optimization over several hops or domains. 3.1.1.6. "Provide A Credible Environment" Routing mechanism information must be integral and secure (credible data, reliable operation). Security from unwanted modification and influence is required. Relevance: Valid. Current practice: BGP provides a mechanism for authentication and security. There are however security problems with current practice. The Routing Protocol Security Requirements (rpsec) working group has been struggling to agree on a set of requirements for BGP security since early 2002. 3.1.1.7. "Be A Managed Entity" Requires that a manager should get enough information on a state of network so that s/he could make informed decisions. Relevance: The requirement is reasonable, but we might need to be more specific on what information should be available, e.g. to prevent routing oscillations. Current practice: All policies are determined locally, where they may appear reasonable but there is limited global coordination through the routing policy databases operated by the Internet registries (ARIN, RIPE, APNIC etc). Operators are not required to register their policies; even when policies are registered, it is difficult to check that the actual policies in use match the declared policies and therefore a manager cannot guarantee to make a globally consistent decision. 3.1.1.8. "Minimize Required Resources" Doria & Davies Expires April 27, 2006 [Page 8] Internet-Draft IDR History October 2005 Relevance: Valid, however, the paragraph states that assumptions on significant upgrades shouldn't be made. Although this is reasonable, a new architecture should perhaps be prepared to use upgrades when they occur. Current practice: Most bandwidth is consumed by the exchange of the Network Layer Reachability Information (NLRI). Usage of processing cycles ("Central Processor Usage" - CPU) depends on the stability of the Internet. Both phenomena have a local nature, so there are not scaling problems with bandwidth and CPU usage. Instability of routing increases the consumption of resources in any case. The number of networks in the Internet dominates memory requirements - this is a scaling problem. 3.1.2. "Functional Requirements" 3.1.2.1. "Route Synthesis Requirements" 3.1.2.1.1. "Route around failures dynamically" Relevance: Valid. Should perhaps be stronger. Only providing a best-effort attempt may not be enough if real-time services are to be provided for. Detections may need to be faster than 100ms to avoid being noticed by end-users. Current practice: Latency of fail-over is too high; sometimes minutes or longer. 3.1.2.1.2. "Provide loop free paths" Relevance: Valid. Loops should occur only with negligible probability and duration. Current practice: Both link-state intra domain routing and BGP inter- domain routing (if correctly configured) are forwarding- loop free after having converged. However, convergence time for BGP can be very long and poorly designed routing policies may result in a number of BGP speakers engaging in a cyclic pattern of advertisements and withdrawals which never converges to a stable result [I-D.mcpherson- bgp-route-oscillation]. Perhaps this is one context in which the need for global convergence needs to be reviewed. Doria & Davies Expires April 27, 2006 [Page 9] Internet-Draft IDR History October 2005 3.1.2.1.3. "Know when a path or destination is unavailable" Relevance: Valid to some extent, but there is a trade-off between aggregation and immediate knowledge of reachability. It requires that routing tables contain enough information to determine that the destination is unknown or a path cannot be constructed to reach it. Current practice: Knowledge about lost reachability propagates slowly through the networks due to slow convergence for route withdrawals. 3.1.2.1.4. "Provide paths sensitive to administrative policies" Relevance: Valid. Policy control of routing is of increasingly importance as the Internet has turned into a business. Current practice: Supported to some extent. Policies can only be applied locally in an AS and not globally. At least there is a very small probability of affecting policies in other AS's. Furthermore, only static policies are supported; between static policies and policies dependent upon volatile events of great celerity` there should exist events that routing should be aware of. Lastly, there is no support for policies other than route- properties (such as AS-origin, AS-path, destination prefix, MED-values etc). 3.1.2.1.5. "Provide paths sensitive to user policies" Relevance: Valid to some extent, as they may conflict with the policies of the network administrator. It is likely that this requirement will be met by means of different bit transport services offered by an operator, but at the cost of adequate provisioning, authentication and policing when utilizing the service. Current practice: Not supported in normal routing. Can be accomplished to some extent with loose source routing, resulting in inefficient forwarding in the routers. The various attempts to introduce Quality of Service (QoS - e.g., Integrated Services and Differentiated Services -(DiffServ)) can also be seen as means to support this Doria & Davies Expires April 27, 2006 [Page 10] Internet-Draft IDR History October 2005 requirement but they have met with limited success in terms of providing alternate routes as opposed to providing improved service on the standard route. 3.1.2.1.6. "Provide paths which characterize user quality-of-service requirements" Relevance: Valid to some extent, as they may conflict with the policies of the operator. It is likely that this requirement will be met by means of different bit transport services offered by an operator, but at the cost of adequate provisioning, authentication and policing when utilizing the service. It has become clear that offering to provide a particular QoS to any arbitrary destination from a particular source is generally impossible: QoS except in very 'soft' forms such as overall long term average packet delay, is generally associated with connection oriented routing. Current practice: Creating routes with specified QoS is not generally possible at present. 3.1.2.1.7. "Provide autonomy between inter- and intra-autonomous system route synthesis" Relevance: Inter- and intra-domain routing should stay independent, but one should notice that this to some extent contradicts the previous three requirements. There is a trade-off between abstraction and optimality. Current practice: Inter-domain routing is performed independently of intra-domain routing. Intra-domain routing is, especially in transit domains, very interrelated to inter-domain routing. 3.1.2.2. "Forwarding Requirements" 3.1.2.2.1. "Decouple inter- and intra-autonomous system forwarding decisions" Doria & Davies Expires April 27, 2006 [Page 11] Internet-Draft IDR History October 2005 Relevance: Valid. Current practice: As explained in Section 3.1.2.1.7, intra-domain forwarding in transit domains is codependent with inter-domain forwarding decisions. 3.1.2.2.2. "Do not forward datagrams deemed administratively inappropriate" Relevance: Valid, and increasingly important in the context of enforcing policies correctly expressed through routing advertisements but flouted by rogue peers which send traffic for which a route has not been advertised. On the other hand, packets that have been misrouted due to transient routing problems perhaps should be forwarded to reach the destination, although along an unexpected path. Current practice: At stub domains there is packet filtering, e.g., to catch source address spoofing on outgoing traffic or to filter out unwanted incoming traffic. Filtering can in particular reject traffic (such as unauthorized transit traffic) that has been sent to a domain even when it has not advertised a route for such traffic on a given interface. The growing class of 'middle boxes' (midboxes, e.g., Network Address Translators - NATs) is quite likely to apply administrative rules that will prevent forwarding of packets. Note that security policies may deliberately hide administrative denials. In the backbone, intentional packet dropping based on policies is not common. 3.1.2.2.3. "Do not forward datagrams to failed resources" Relevance: Unclear, although it is clearly desirable to minimise waste of forwarding resources by discarding datagrams which cannot be delivered at the earliest opportunity. There is a trade-off between scalability and keeping track of unreachable resources. Equipment closest to a failed node has the highest motivation to keep track of failures so that waste can be minimised. Doria & Davies Expires April 27, 2006 [Page 12] Internet-Draft IDR History October 2005 Current practice: Routing protocols use both internal adjacency management sub-protocols (e.g. Hello protocols) and information from equipment and lower layer link watchdogs to keep track of failures in routers and connecting links. Failures will eventually result in the routing protocol reconfiguring the routing to avoid (if possible) a failed resource, but this is generally very slow (30s or more). In the meantime datagrams may well be forwarded to failed resources. In general terms, end hosts and some non-router midboxes do not participate in these notifications and failures of such boxes will not affect the routing system. 3.1.2.2.4. "Forward datagram according to its characteristics" Relevance: Valid. This is necessary in enabling differentiation in the network, based on QoS, precedence, policy or security. Current practice: Ingress and egress filtering can be done based on policy. Some networks discriminate on the basis of requested QoS. 3.1.2.3. "Information Requirements" 3.1.2.3.1. "Provide a distributed and descriptive information base" Relevance: Valid, however hierarchical information bases might provide more possibilities. Current practice: The information base is distributed, but it is unclear whether it supports all necessary routing functionality. 3.1.2.3.2. "Determine resource availability" Relevance: Valid. It should be possible for resource availability and levels of resource availability to be determined. This prevents needing to discover unavailability through failure. Resource location and discovery is arguably a separate concern that could be addressed outside the core routing requirements. Doria & Davies Expires April 27, 2006 [Page 13] Internet-Draft IDR History October 2005 Current practice: Resource availability is predominantly handled outside of the routing system. 3.1.2.3.3. "Restrain transmission utilization" Relevance: Valid. However certain requirements in the control plane, such as fast detection of faults may be worth consumption of more resources. Similarly, simplicity of implementation may make it cheaper to 'back haul' traffic to central locations to minimise the cost of routing if bandwidth is cheaper than processing. Current practice: BGP messages probably do not ordinarily consume excessive resources, but might during erroneous conditions. In the data plane, the near universal adoption of shortest path protocols could be considered to result in minimization of transmission utilization. 3.1.2.3.4. "Allow limited information exchange" Relevance: Valid. But perhaps routing could be improved if certain information could be available either globally or at least for a wider defined locality. Current practice: Policies are used to determine which reachability information is exported. 3.1.2.4. "Environmental Requirements" 3.1.2.4.1. "Support a packet-switching environment" Relevance: Valid but routing system should, perhaps, not be limited to this exclusively. Current practice: Supported. 3.1.2.4.2. "Accommodate a connection-less oriented user transport service" Relevance: Valid, but routing system should, perhaps, not be limited to this exclusively. Doria & Davies Expires April 27, 2006 [Page 14] Internet-Draft IDR History October 2005 Current practice: Accommodated. 3.1.2.4.3. "Accommodate 10K autonomous systems and 100K networks" Relevance: No longer valid. Needs to be increased potentially indefinitely. It is extremely difficult to foresee the future size expansion of the Internet so that the utopian solution would be to achieve an Internet whose architecture is scale invariant. Regrettably, this may not be achievable without introducing undesirable complexity and a suitable trade off between complexity and scalability is likely to be necessary. Current Practice: Supported but perhaps reaching its limit. Since the original version of this document was written in 2001, the number of ASs advertised has grown from around 8000 to 20000, and almost 35000 AS numbers have been allocated by the regional registries. (ref to http://www.potaroo.net/ispcol/2005-08/as.html) If this growth continues the original 16 bit AS space in BGP-4 will be exhausted in less than 5 years. Planning for an extended AS space is now an urgent requirement. 3.1.2.4.4. "Allow for arbitrary interconnection of autonomous systems" Relevance: Valid. However perhaps not all interconnections should be accessible globally. Current practice: BGP-4 allows for arbitrary interconnections. 3.1.2.5. "General Objectives" 3.1.2.5.1. "Provide routing services in a timely manner" Relevance: Valid, as stated before. The more complex a service is the longer it should be allowed to take, but the implementation of services requiring (say) NP-complete calculation should be avoided. Current practice: More or less, with the exception of convergence and fault robustness. Doria & Davies Expires April 27, 2006 [Page 15] Internet-Draft IDR History October 2005 3.1.2.5.2. "Minimize constraints on systems with limited resources" Relevance: Valid Current practice: Systems with limited resources are typically stub domains that advertise very little information. 3.1.2.5.3. "Minimize impact of dissimilarities between autonomous systems" Relevance: Important. This requirement is critical to a future architecture. In a domain routing environment where the internal properties of domains may differ radically, it will be important to be sure that these dissimilarities are minimized at the borders. Current: practice: For the most part this capability is not really required in today's networks since the intra- domain attributes are broadly similar across domains. 3.1.2.5.4. "Accommodate the addressing schemes and protocol mechanisms of the autonomous systems" Relevance: Important, probably more so than when RFC1126 was originally developed because of the potential deployment of IPv6, wider usage of MPLS and the increasing usage of VPNs. Current practice: Only one global addressing scheme is supported in most autonomous systems but the availability of IPv6 services is steadily increasing. Some global backbones support IPv6 routing and forwarding. 3.1.2.5.5. "Must be implementable by network vendors" Relevance: Valid, but note that what can be implemented today is different from what was possible when RFC1126 was written: a future domain routing architecture should not be unreasonably constrained by past limitations. Current practice: BGP was implemented. 3.1.3. "Non-Goals" RFC1126 also included a section discussing non-goals. To what extent are these still non-goals? Does the fact that they were non-goals Doria & Davies Expires April 27, 2006 [Page 16] Internet-Draft IDR History October 2005 adversely affect today's IDR system? 3.1.3.1. "Ubiquity" In a sense this 'non-goal' has effectively been achieved by the Internet and IP protocols. This requirement reflects a different worldview where there was serious competition for network protocols, which is really no longer the case. Ubiquitous deployment of inter- domain routing in particular has been achieved and must not be undone by any proposed future domain routing architecture. On the other hand: o ubiquitous connectivity cannot be reached in a policy sensitive environment and should not be an aim, o it must not be required that the same routing mechanisms are used throughout provided that they can interoperate appropriately o the information needed to control routing in a part of the network should not necessarily be ubiquitously available and it must be possible for an operator to hide commercially sensitive information that is not needed outside a domain. Relevance: De facto essential for a future domain routing architecture, but what is required is ubiquity of the routing system rather than ubiquity of connectivity. Current practice: De facto ubiquity achieved. 3.1.3.2. "Congestion control" Relevance: It is not clear if this non-goal was to be applied to routing or forwarding. It is definitely a non- goal to adapt the choice of route when there is transient congestion. However, to add support for congestion avoidance (e.g., ECN and ICMP messages) in the forwarding process would be a useful addition. There is also extensive work going on in traffic engineering which should result in congestion avoidance through routing as well as in forwarding. Current practice: Some ICMP messages (e.g., source quench) exist to deal with congestion control but these are not generally used as they either make the problem worse or there is no mechanism to reflect the message into the application which is providing the source. Doria & Davies Expires April 27, 2006 [Page 17] Internet-Draft IDR History October 2005 3.1.3.3. "Load splitting" Relevance: This should neither be a non-goal, nor an explicit goal. It might be desirable in some cases and should be considered as an optional architectural feature. Current practice: Can be implemented by exporting different prefixes on different links, but this requires manual configuration and does not consider actual load. 3.1.3.4. "Maximizing the utilization of resources" Relevance: Valid. Cost-efficiency should be strived for; maximizing resource utilization does not always lead to greatest cost-efficiency. Current practice: Not currently part of the system, though often a 'hacked in' feature done with manual configuration. 3.1.3.5. "Schedule to deadline service" This non-goal was put in place to ensure that the IDR did not have to meet real time deadline goals such as might apply to CBR services in ATM. Relevance: The hard form of deadline services is still a non- goal for the future domain routing architecture but overall delay bounds are much more of the essence than was the case when RFC1126 was written. Current practice: Service providers are now offering overall probabilistic delay bounds on traffic contracts. To implement these contracts there is a requirement for a rather looser form of delay sensitive routing. 3.1.3.6. "Non-interference policies of resource utilization" The requirement in RFC1126 is somewhat opaque, but appears to imply that what we would today call QoS routing is a non-goal and that routing would not seek to control the elastic characteristics of Internet traffic whereby a TCP connection can seek to utilize all the spare bandwidth on a route, possibly to the detriment of other connections sharing the route or crossing it. Doria & Davies Expires April 27, 2006 [Page 18] Internet-Draft IDR History October 2005 Relevance: Open Issue. It is not clear whether dynamic QoS routing can or should be implemented. Such a system would seek to control the admission and routing of traffic depending on current or recent resource utilization. This would be particularly problematic where traffic crosses an ownership boundary because of the need for potentially commercially sensitive information to be made available outside the ownership boundary. Current practice: Routing does not consider dynamic resource availability. Forwarding can support service differentiation. 3.2. ISO OSI IDRP, BGP and the Development of Policy Routing During the decade before the widespread success of the World Wide Web, ISO was developing the comunications architecture and protocol suite Open Systems Intercnnection (OSI). For a considerable part of this time OSI was seen as a possible competitor for and even a replacement for the IP suite as this basis for the Internet. The technical developments of the two protocols were quite heavily interrelated with each providing ideas and even components that were adapted into the other suite. During the early stages of the development of OSI, the IP suite was still mainly in use on the ARPANET and the relatively small scale first phase NSFNET. This was a effectively a single administrative domain with a simple tree structured network in a three level hierarchy connected to a single logical exchange point (the NSFnet backbone). In the second half of the 1980s the NSFNET was starting on the growth and transformation that would lead to today's Internet. It was becoming clear that the backbone routing protocol, the Exterior Gateway Protocol (EGP)[RFC0904], was not going to cope even with the limited expansion being planned. EGP is an "all informed" protocol which needed to know the identities of all gateways and this was no longer reasonable. The first version of the Border Gateway Protocol (BGP-1) [RFC1105]was developed as a replacement, but was specifically designed to work on a tree structured network (links are classified as upwards, downwards or sideways). Meanwhile the OSI architects, led by Lyman Chapin, were developing a much more general architecture for large scale networks. They had recognized that no one node, especially an end-system (host) could or should attempt to remember routes from "here" to "anywhere" - this sounds obvious today but was not so obvious 20 years ago. They were also considering hierarchical networks with independently administered domains - a model already well entrenched in the public Doria & Davies Expires April 27, 2006 [Page 19] Internet-Draft IDR History October 2005 switched telephone network. This led to a vision of a network with multiple independent administrative domains with an arbitrary interconnection graph and a hierarchy of routing functionality. This architecture was fairly well established by 1987 [Tsuchiya87]. The architecture initially envisaged a three level routing functionality hierarchy in which each layer had significantly different characteristics: 1. _End-system to Intermediate system routing _(host to router), in which the principal functions are discovery and redirection. 2. _Intra-domain intermediate system to intermediate system routing _(router to router), in which "best" routes between end-systems in a single administrative domain are computed and used. A single algorithm and routing protocol would be used throughout any one domain. 3. _Inter-domain intermediate-system to intermediate system routing_ (router to router), in which routes between routing domains within administrative domains are computed (routing is considered spearately between administrative domains and routing domains). Level 3 of this hierarchy was still somewhat fuzzy. Tsuchiya says: The last two components, Inter-Domain and Inter-Administration routing, are less clear-cut. It is not obvious what should be standardized with respect to these two components of routing. For example, for Inter-Domain routing, what can be expected from the Domains? By asking Domains to provide some kind of external behavior, we limit their autonomy. If we expect nothing of their external behavior, then routing functionality will be minimal. Across administrations, it is not known how much trust there will be. In fact, the definition of trust itself can only be determined by the two or more administrations involved. Fundamentally, the problem with Inter-Domain and Inter- Administration routing is that autonomy and mistrust are both antithetical to routing. Accomplishing either wilI involve a number of tradeoffs which will require more knowledge about the environments within which they will operate. Further refinement of the model occurred over the next couple of years and a more fully formed view is given by Huitema and Dabbous in 1989 [Huitema90] [Routeing protocols development in the OSI architecture]. By this stage work on the original IS-IS link state protocol, originated by the Digital Equipment Corporation (DEC), was fairly advanced and was close to becoming a Draft International Doria & Davies Expires April 27, 2006 [Page 20] Internet-Draft IDR History October 2005 Standard. IS-IS is of course a major component of intra-domain routing today and inspired the development of the OSPF family. However, Huitema and Dabbous were not able to give any indication of protocol work for Level 3. There are hints of possible use of centralized route servers. In the meantime, the NSFnet consortium and the IETF had been struggling with the rapid growth of the NSFnet. It had been clear since fairly early on that EGP was not suitable for handling the expanding network and the race was on to find a replacement. EGP is 'not a routing algorithm' - it provides information about the reachability of 'Autonomous Systems' via a common core network. There had been some intent to include a metric in EGP to facilitate routing decisons, but no agreement could be reached on how to define the metric. The lack of trust was seen as one of the main reasons that EGP could not establish a globally acceptable routing metric: again this seems to be a clearly futile aim from this distance in time! Consequently EGP became effectively a rudimentary path-vector protocol which linked gateways with Autonomous Systems. It was totally reliant on the tree structured network to avoid routing loops and the all informed nature of EGP meant that update packets became very large. BGP version 1 [RFC1105] was the first real path-vector routing protocol (standardized in 1989) which was intended to relieve some of the scaling problems but it still assumed a tree structured network. Routes were described as paths along a 'vector' of ASs without any associated cost metric. Also the NSFnet was a government funded research and education network. Commercial companies which were partners in some of the projects were using the NSFnet for their research activities but it was becoming clear that these companies also needed networks for commercial traffic. NSFnet had put in place "acceptable use" policies which were intended to limit the use of the network. However there was little or no technology to support the legal framework. Practical experience, IETF IAB discussion and the OSI theoretical work were by now coming to the same conclusions: o Networks were going to be composed out of multiple administrative domains (the federated network), o The connections between these domains would be an arbitrary graph and certainly not a tree, o The administrative domains would wish to establish distinctive, independent routing policies through the graph of Autonomous Systems, and o Administrative Domains would have a degree of distrust of each other which would mean that policies would remain opaque. Doria & Davies Expires April 27, 2006 [Page 21] Internet-Draft IDR History October 2005 These views were reflected by Susan Hares' (Merit) contribution to the Internet Architecture (INARC) workshop in 1989, summarized in the report of the workshop [INARC89]: The rich interconnectivity within the Internet causes routing problems today. However, the presenter believes the problem is not the high degree of interconnection, but the routing protocols and models upon which these protocols are based. Rich interconnectivity can provide redundancy which can help packets moving even through periods of outages. Our model of interdomain routing needs to change. The model of autonomous confederations and autonomous systems [RFC0975] no longer fits the reality of many regional networks. The ISO models of administrative domain and routing domains better fit the current internet's routing structure. With the first NSFNET backbone, NSF assumed that the Internet would be used as a production network for research traffic. We cannot stop these networks for a month and install all new routing protocols. The Internet will need to evolve its changes to networking protocols while still continuing to serve its users. This reality colors how plans are made to change routing protocols. It is also interesting to note that the difficulties of organising a transition were recognized at this stage. Policies would primarily be interested in controlling which traffic should be allowed to transit a domain (to satisfy commercial constraints or acceptable use policies) thereby controlling which traffic uses the resources of the domain. The solution adopted by both the IETF and OSI was a form of distance vector hop-by-hop routing with explicit policy terms. The reasoning for this choice can be found in Breslau and Estrin's 1990 paper [Breslau90] (implicitly - because some other alternatives are given such as a link state with policy suggestion which, with hindsight, would have even greater problems than BGP on a global scale network). Traditional distance vector protocols exchanged routing information in the form of a destination and a metric. The new protocols explicitly associated policy expressions with the route by including either a list of the source ASs that are permitted to use the route described in the routing update, and/or a list of all ASs traversed along the advertised route. Parallel protocol developments were already in progress by the time this paper was published: BGP version 2 [RFC1163] in the IETF and the Inter-Domain Routing Protocol (IDRP) [ISO10747] which would be the Level 3 routing protcocol for the OSI architecture. IDRP was Doria & Davies Expires April 27, 2006 [Page 22] Internet-Draft IDR History October 2005 developed under the aegis of the ANSI XS3.3 working group led by Lyman Chapin and Charles Kunzinger. The two protocols were very similar in basic design but IDRP has some extra features, some of which have been incorporated into later versions of BGP; others may yet be so and still others may be seen to be inappropriate. Breslau and Estrin summarize the design of IDRP as follows: IDRP attempts to solve the looping and convergence problems inherent in distance vector routing by including full AD [Administrative Domain - essentially the equivalent of what are now called ASs] path information in routing updates. Each routing update includes the set of ADs that must be traversed in order to reach the specified destination. In this way, routes that contain AD loops can be avoided. IDRP updates also contain additional information relevant to policy constraints. For instance, these updates can specify what other ADs are allowed to receive the information described in the update. In this way, IDRP is able to express source specific policies. The IDRP protocol also provides the structure for the addition of other types of policy related information in routing updates. For example, User Class Identifiers (UCI) could also be included as policy attributes in routing updates. Using the policy route attributes IDRP provides the framework for expressing more fine grained policy in routing decisions. However, because it uses hop-by-hop distance vector routing, it only allows a single route to each destination per-QOS to be advertised. As the policy attributes associated with routes become more fine grained, advertised routes will be applicable to fewer sources. This implies a need for multiple routes to be advertised for each destination in order to increase the probability that sources have acceptable routes available to them. This effectively replicates the routing table per forwarding entity for each QOS, UCI, source combination that might appear in a packet. Consequently, we claim that this approach does not scale well as policies become more fine grained, i.e., source or UC1 specific policies. Over the next three or four years successive versions of BGP (BGP-2 [RFC1163], BGP-3 [RFC1267] and BGP-4 [RFC1771], [I-D.ietf-idr-bgp4]) were deployed to cope with the growing and by now commercialized Internet. BGP version 4 was developed to handle the change from classful to classless addressing. For most of this time IDRP was being developed in parallel, and both protocols were implemented in the Merit gatedaemon routing protcol suite. During this time there was a movement within the IETF which saw BGP as a stopgap measure to be used until the more sophisticated IDRP could be adapted to run Doria & Davies Expires April 27, 2006 [Page 23] Internet-Draft IDR History October 2005 over IP instead of the OSI connectionless protocol CLNP. However, unlike it's IGP counterpart IS-IS which has stood the test of time, and indeed proved to be more flexible than OSPF, IDRP was ultimately not adopted by the market. By the time the NSFnet backbone was decommissioned in 1995, BGP-4 was the inter-domain routing protocol of choice and OSI's star was already beginning to wane. IDRP is now little remembered. A more complete account of the capabilities of IDRP can be found in chaper 14 of David Piscitello and Lyman Chapin's book 'Open Systems Networking: TCP/IP and OSI' which is now readable on the Internet [Chapin94]. IDRP also contained quite extensive means for securing routing exchanges much of it based on X.509 certificates for each router and public/private key encryption of routing updates. Some of the capabilities of IDRP which might yet appear in a future version of BGP include the ability to manage routes with explicit QoS classes, and the concept of domain confederations (somewhat different from the confederation mechanism in today's BGP) as an extra level in the hierarchy of routing. 3.3. Nimrod Requirements Nimrod as expressed by Noel Chiappa in his early document, "A New IP Routing and Addressing Architecture" [Chiappa91] and later in the NIMROD Working Group documents [RFC1753] and [RFC1992] established a number of requirements that need to be considered by any new routing architecture. The Nimrod requirements took RFC1126 as a starting point and went further. The goals of Nimrod, quoted from [RFC1992], were as follows 1. To support a dynamic internetwork of _arbitrary size_ (our emphasis) by providing mechanisms to control the amount of routing information that must be known throughout an internetwork. 2. To provide service-specific routing in the presence of multiple constraints imposed by service providers and users. 3. To admit incremental deployment throughout an internetwork. It is certain that these goals should be considered requirements for any new domain routing architecture. o As discussed in other sections of this document the amount of information needed to maintain the routing system is growing at a rate that does not scale. And yet, as the services and constraints upon those services grow there is a need for more information to be maintained by the routing system. One of the Doria & Davies Expires April 27, 2006 [Page 24] Internet-Draft IDR History October 2005 key terms in the first requirements is 'control'. While increasing amounts of information need to be known and maintained in the Internet, the amounts and kinds of information that are distributed can be controlled. This goal should be reflected in the requirements for the future domain architecture. o If anything, the demand for specific services in the Internet has grown since 1996 when the Nimrod architecture was published. Additionally the kinds of constraints that service providers need to impose upon their networks and that services need to impose upon the routing have also increased. Any changes made to the network in the last half-decade have not significantly improved this situation. o The ability to incrementally deploy any new routing architecture within the Internet is still a absolute necessity. It is impossible to imagine that a new routing architecture could supplant the current architecture on a flag day At one point in time Nimrod, with its addressing and routing architectures was seen as a candidate for IPng. History shows that it was not accepted as the IPng, having been ruled out of the selection process by the IESG in 1994 on the grounds that it was 'too much of a research effort' [RFC1752], although input for the requirements of IPng was explicitly solicited from Chiappa [RFC1753]. Instead IPv6 has been put forth as the IPng. Without entering a discussion of the relative merits of IPv6 versus Nimrod, it is apparent that IPv6, while it may solve many problems, does not solve the critical routing problems in the Internet today. In fact in some sense it exacerbates them by adding a requirement for support of two internet protocols and their respective addressing methods. In many ways the addition of IPv6 to the mix of methods in today's Internet only points to the fact that the goals, as set forth by the Nimrod team, remain as necessary goals. There is another sense in which study of Nimrod and its architecture may be important to deriving a future domain routing architecture. Nimrod can be said to have two derivatives: o MPLS in that it took the notion of forwarding along well known paths o PNNI in that it took the notion of abstracting topological information and using that information to create connections for traffic. It is important to note, that whilst MPLS and PNNI borrowed ideas from Nimrod, neither of them can be said to be an implementation of this architecture. Doria & Davies Expires April 27, 2006 [Page 25] Internet-Draft IDR History October 2005 3.4. PNNI The Private Network-Node Interface (PNNI) was developed under the ATM Forum's auspices as a hierarchical route determination protocol for ATM, a connection oriented architecture. It is reputed to have developed several of its methods from a study of the Nimrod architecture. What can be gained from an analysis of what did and did not succeed in PNNI? The PNNI protocol includes the assumption that all peer groups are willing to cooperate, and that the entire network is under the same top administration. Are there limitations that stem from this 'world node' presupposition? As discussed in [RFC3221], the Internet is no longer a clean hierarchy and there is a lot of resistance to having any sort of 'ultimate authority' controlling or even brokering communication. PNNI is the first deployed example of a routing protocol that uses abstract map exchange (as opposed to distance vector or link state mechanisms) for inter-domain routing information exchange. One consequence of this is that domains need not all use the same mechanism for map creation. What were the results of this abstraction and source based route calculation mechanism? Since the authors of this document do not have experience running a PNNI network, the comments above are from a theoretical perspective. Information on these issues, and any other relevant issues, is solicited from those who do have such operational experience. Further research is required. 3.5. Recent Research Work 3.5.1. Developments in Internet Connectivity The work commissioned from Geoff Huston by the Internet Architecture Board [RFC3221] draws a number of conclusions from analysis of BGP routing tables and routing registry databases: o The connectivity between provider ASs is becoming more like a dense mesh than the tree structure that was commonly assumed to be commonplace a couple of years ago. This has been driven by the increasing amounts charged for peering and transit traffic by global service providers. Local direct peering and internet exchanges are becoming steadily more common as the cost of local fibre connections drops. o End user sites are increasingly resorting to multi-homing onto two or more service providers as a way of improving resiliency. This has a knock-on effect of spectacularly fast depletion of the available pool of AS numbers as end user sites require public AS Doria & Davies Expires April 27, 2006 [Page 26] Internet-Draft IDR History October 2005 numbers to become multi-homed and corresponding increase in the number of prefixes advertised in BGP. o Multi-homed sites are using advertisement of longer prefixes in BGP as a means of traffic engineering to load spread across their multiple external connections with further impact on the size of the BGP tables. o Operational practices are not uniform, and in some cases lack of knowledge or training is leading to instability and/or excessive advertisement of routes by incorrectly configured BGP speakers. o All these factors are quickly negating the advantages in limiting the expansion of BGP routing tables that were gained by the introduction of CIDR and consequent prefix aggregation in BGP. It is also now impossible for IPv6 to realize the world view in which the default free zone would be limited to perhaps 10,000 prefixes. o The typical 'width' of the Internet in AS hops is now around five, and much less in many cases. These conclusions have a considerable impact on the requirements for the future domain routing architecture: o Topological hierarchy (e.g. mandating a tree structured connectivity) cannot be relied upon to deliver scalability of a large Internet routing system o Aggregation cannot be relied upon to constrain the size of routing tables for an all-informed routing system 3.5.2. Defending the End To End Principle DARPA funded a project to think about a new architecture for future generation Internet, called NewArch (http://www.isi.edu/newarch/). Work started in the first half of 2000 and the main project finished in 2003. Editor's Note: The next version of this document will contain additional material on the results of this project and its impact on the requirements for inter-domain routing. The main development so far is to conclude that as the Internet becomes mainstream infrastructure, fewer and fewer of the requirements are truly global but may apply with different force or not at all in certain parts of the network. This (it is claimed) makes the compilation of a single, ordered list of requirements deeply problematic. Instead we may have to produce multiple requirement sets with support for differing requirement importance at different times and in different places. This 'meta-requirement' significantly impacts architectural design. Potential new technical requirements identified so far include: Doria & Davies Expires April 27, 2006 [Page 27] Internet-Draft IDR History October 2005 o Commercial environment concerns such as richer inter-provider policy controls and support for a variety of payment models o Trustworthiness o Ubiquitous mobility o Policy driven self-organisation ('deep auto configuration') o Extreme short-time-scale resource variability o Capacity allocation mechanisms o Speed, propagation delay and Delay/BandWidth Product issues Non-technical or political 'requirements' include: o Legal and Policy drivers such as * Privacy and free/anonymous speech * Intellectual property concerns * Encryption export controls * Law enforcement surveillance regulations * Charging and taxation issues o Reconciling national variations and consistent operation in a world wide infrastructure One of the participants in this work (Dave Clark) with one of his associates has also published a very interesting paper analyzing the impact of some of these new requirements on the end-to-end principle that has guided the development of the Internet to date [Blumenthal01]. Their primary conclusion is that the loss of trust between the users at the ends of end to end has the most fundamental effect on the Internet. This is clear in the context of the routing system, where operators are unwilling to reveal the inner workings of their networks for commercial reasons. Similarly, trusted third parties and their avatars (mainly mid-boxes of one sort or another) have a major impact on the end-to-end principles and the routing mechanisms that went with them. Overall, the end to end principles should be defended so far as is possible - some changes are already too deeply embedded to make it possible to go back to full trust and openness - at least partly as a means of staving off the day when the network will ossify into an unchangeable form and function (much as the telephone network has done). The hope is that by that time a new Internet will appear to offer a context for unfettered innovation. 4. Existing problems of BGP and the current EGP/IGP Architecture Although most of the people who have to work with BGP today believe it to be a useful, working protocol, discussions have brought to light a number of areas where BGP or the relationship between BGP and the IGPs in use today could be improved. This section is, to a large extent, a wish list for the future domain routing architecture based on those areas where BGP is seen to be lacking, rather than simply a list of problems with BGP. The shortcomings of today's inter-domain Doria & Davies Expires April 27, 2006 [Page 28] Internet-Draft IDR History October 2005 routing system have also been extensively surveyed in 'Architectural Requirements for Inter-Domain Routing in the Internet' [RFC3221], particularly with respect to its stability and the problems produced by explosions in the size of the Internet. 4.1. BGP and Auto-aggregation The stability and later linear growth rates of the number of routing objects (prefixes) that was achieved by the introduction of CIDR around 1994, has now been once again been replaced by near- exponential growth of number of routing objects. The granularity of many of the objects advertised in the default free zone is very small (prefix length of 22 or longer): This granularity appears to be a by- product of attempts to perform precision traffic engineering related to increasing levels of multi-homing. At present there is no mechanism in BGP that would allow an AS to aggregate such prefixes without advance knowledge of their existence, even if it was possible to deduce automatically that they could be aggregated. Achieving satisfactory auto-aggregation would also significantly reduce the non-locality problems associated with instability in peripheral ASs. On the other hand, it may be that alterations to the connectivity of the net as described in [RFC3221] and Section 2.5.1 may limit the usefulness of auto-aggregation 4.2. Convergence and Recovery Issues BGP today is a stable protocol under most circumstances but this has been achieved at the expense of making the convergence time of the inter-domain routing system very slow under some conditions. This has a detrimental effect on the recovery of the network from failures. The timers that control the behavior of BGP are typically set to values in the region of several tens of seconds to a few minutes, which constrains the responsiveness of BGP to failure conditions. In the early days of deployment of BGP, poor network stability and router software problems lead to storms of withdrawals closely followed by re-advertisements of many prefices. To control the load on routing software imposed by these 'route flaps', route flap damping was introduced into BGP. Most operators have now implemented a degree of route flap damping in their deployments of BGP. This restricts the number of times that the routing tables will be rebuilt even if a route is going up and down very frequently. Unfortunately, the effect of route flap damping is exponential in its behavior that can result in some parts of the Internet being inaccessible for hours at a time. Doria & Davies Expires April 27, 2006 [Page 29] Internet-Draft IDR History October 2005 There is evidence ([RFC3221] and our own measurements [Jiang02]) that in today's network route flap is disproportionately associated with the fine grain prefices (length 22 or longer) associated with traffic engineering at the periphery of the network. Auto-aggregation as previously discussed would tend to mask such instability and prevent it being propagated across the whole network. Another question that needs to be studied is the continuing need for an architecture that requires global convergence. Some of our studies (yet to be published) show that, in some localities at least, the network never actually reaches stability; i.e. never really globally converges. Can a global, and beyond, network be designed with the requirement of global convergence? 4.3. Non-locality of Effects of Instability and Misconfiguration There have been a number of instances, some of which are well documented of a mistake in BGP configuration in a single peripheral AS propagating across the whole Internet and resulting in misrouting of most of the traffic in the Internet. Similarly, route flap in a single peripheral AS can require route table recalculation across the entire Internet. This non-locality of effects is highly undesirable, and it would be a considerable improvement if such effects were naturally limited to a small area of the network around the problem. This is another argument for an architecture that does not require global convergence. 4.4. Multihoming Issues As discussed previously, the increasing use of multi-homing as a robustness technique by peripheral ASs requires that multiple routes have to be advertised for such domains. These routes must not be aggregated close in to the multi-homed domain as this would defeat the traffic engineering implied by multi-homing and currently cannot be aggregated further away from the multi-homed domain due to the lack of auto-aggregation capabilities. Consequentially the default free zone routing table is growing exponentially, as it was before CIDR. The longest prefix match routing technique introduced by CIDR, and implemented in BGP-4, when combined with provider address allocation is an obstacle to effective multi-homing if load sharing across the multiple links is required: If an AS has been allocated its addresses from an upstream provider, the upstream provider can aggregate those addresses with those of other customers and need only advertise a single prefix for a range of customers. But, if the customer AS is Doria & Davies Expires April 27, 2006 [Page 30] Internet-Draft IDR History October 2005 also connected to another provider, the second provider is not able to aggregate the customer addresses because they are not taken from his allocation, and will therefore have to announce a more specific route to the customer AS. The longest match rule will then direct all traffic through the second provider, which is not as required. Example: \ / AS1 AS2 \ / AS3 AS3 has received its addresses from AS1, which means AS1 can aggregate. But if AS3 want its traffic to be seen equally both ways, AS1 is forced to announce both the aggregate and the more specific route to AS3. Figure 1: Address Aggregation This problem has induced many ASs to apply for their own address allocation even though they could have been allocated from an upstream provider further exacerbating the default free zone route table size explosion. This problem also interferes with the desire of many providers in the default free zone to route only prefixes that are equal to or shorter than 20 or 19 bits. Note that some problems which are referred to as multihoming issues are not and should not solvable through the routing system (e.g. where a TCP load distributor is needed), and multihoming is not a panacea for the general problem of robustness in a routing system [I-D.berkowitz-multirqmt]. 4.5. AS-number exhaustion The domain identifier or AS-number is a 16-bit number. When this paper was originally written in 2001, allocation of AS-numbers was increasing 51% a year [RFC3221] and exhaustion by 2005 was predicted. According to some recent work by again by Huston [Huston05], the rate of increase dropped off after the business downturn but as of July 2005, well over half the available AS numbers (39000 out of 64510) have been allocated and around 20000 are visible in the global BGP routing tables. The rate of allocation is currently about 3500 per year. Depending on the curve fitting model used to predict when exhaustion will occur, the pool will run out somewhere between 2010 and 2013. There appear to be other factors at work in this rate of Doria & Davies Expires April 27, 2006 [Page 31] Internet-Draft IDR History October 2005 increase beyond an increase in the number of ISPs in business, although there is a fair degree of correlation between these numbers. AS numbers are now used for a number of purposes beyond that of identifying large routing domains: multihomed sites acquire an AS number in order to express routing preferences to their various providers and AS numbers are used part of the addressing mechanism for MPLS/BGP-based virtual private networks (VPNs) [RFC2547]. The IETF has had a proposal under development for over four years to increase the available range of AS-numbers to 32 bits [I-D.ietf-idr- as4bytes]. Much of the slowness in development is due to the deployment challenge during transition. Because of the difficulties of transition, deployment needs to start well in advance of actual exhaustion so that the network as a whole is ready for the new capability when it is needed. This implies that standardisation needs to be complete and implementations available at least one equipment replacement cycle in advance of expected exhaustion so that deployement should be starting around 2008. 4.6. Partitioned AS's Tricks with discontinuous ASs are used by operators, for example, to implement anycast. Discontinuous ASs may also come into being by chance if a multi-homed domain becomes partitioned as a result of a fault and part of the domain can access the Internet through each connection. It may be desirable to make support for this kind of situation more transparent than it is at present. 4.7. Load Sharing Load splitting or sharing was not a goal of the original designers of BGP and it is now a problem for today's network designers and managers. Trying to fool BGP into load sharing between several links is a constantly recurring exercise for most operators today. Traffic engineering extensions to the future domain routing architecture that will facilitate load sharing should be considered. 4.8. Hold down issues As with the interval between 'hello' messages in OSPF, the typical size and defined granularity (seconds to tens of seconds) of the 'keep-alive' time negotiated at start-up for each BGP connection constrains the responsiveness of BGP to link failures. The recommended values and the available lower limit for this timer were set to limit the overhead caused by keep-alive messages when link bandwidths were typically much lower than today. Analysis and experiment ([I-D.alaettinoglu-isis-convergence], [I-D.sandiick-flip] & [I-D.lang-mpls-lmp]) indicate that faster links could sustain a Doria & Davies Expires April 27, 2006 [Page 32] Internet-Draft IDR History October 2005 much higher rate of keep-alive messages without significantly impacting normal data traffic. This would improve responsiveness to link and node failures but with a corresponding increase in the risk of instability, if the error characteristics of the link are not taken properly into account when setting the keep-alive interval. An additional problem with the hold-down mechanism in BGP is the amount of information that has to be exchanged to re-establish the database of route advertisements on each side of the link when it is re-established after a failure. Currently any failure, however brief forces a full exchange which could perhaps be constrained by retaining some state across limited time failures and using revision control, transaction and replication techniques to re-synchonise the databases. Various techniques have been implemented to try to reduce this problem but they have not yet been standardised. 4.9. Interaction between Inter domain routing and intra domain routing Today, many operators' backbone routers run both I-BGP and an IGP maintain the routes that reach between the borders of the domain. Exporting routes from BGP into IGP and bringing them back up to BGP is not recommended [RFC2791], but it is still necessary for all backbone routers to run both protocols. BGP is used to find the egress point and IGP to find the path (next hop router) to the egress point across the domain. This is not only a management problem but may also create other problems: o BGP is a distance vector protocol, as compared with most IGPs, which are link state protocols, and as such it is not optimised for convergence speed although they generally require less processing power. Incidentally, more efficient distance vector algorithms are available such as [Xu97]. o The metrics used in BGP and the IGP are rarely comparable or combinable. Whilst there are arguments that the optimizations inside a domain may be different from those for end-to-end paths, there are occasions, such as calculating the 'topologically nearest' server when computable or combinable metrics would be of assistance. o The policies that can be implemented using BGP are designed for control of traffic exchange between operators, not for controlling paths within a domain. Policies for BGP are most conveniently expressed in RPSL and this could be extended if thought desirable to include additional policy information. o If the NEXT HOP destination for a set of BGP routes becomes inaccessible because of IGP problems, the routes using the vanished next hop have to be invalidated at the next available UPDATE. Subsequently, if the next hop route reappears, this would normally lead to the BGP speaker requesting a full table from its neighbour(s). Current implementations may attempt to circumvent Doria & Davies Expires April 27, 2006 [Page 33] Internet-Draft IDR History October 2005 the effects of IGP route flap by caching the invalid routes for a period in case the next hop is restored. o Synchronization between IGP and EGP is a problem as long as we use different protocols for IGP and EGP, which will most probably be the case even in the future because of the differing requirements in the two situations. Some sort of synchronization between those two protocols would be useful. In the draft 'OSPF Transient Blackhole Avoidance' [I-D.mcpherson-ospf-transient], the IGP side of the story is covered. o Synchronizing in BGP means waiting for the IGP to know about the same networks as the EGP, which can take a significant period of time and slows down the convergence of BGP by adding the IGP convergence time into each cycle. 4.10. Policy Issues There are several classes of issue with current BGP policy: o Policy is installed in an ad-hoc manner in each autonomous system. There isn't a method for ensuring that the policy installed in one router is coherent with policies installed in other routers. o As described in Griffin [Griffin99] and in McPherson [I-D.mcpherson-bgp-route-oscillation] it is possible to create policies for ASs, and instantiate them in routers, that will cause BGP to fail to converge in certain types of topology o There is no available network model for describing policy in a coherent manner. Policy management is extremely complex and mostly done without the aid of any automated procedures. The extreme complexity means that a highly qualified specialist is required for policy management of border routers. The training of these specialists is quite lengthy and needs to involve long periods of hands-on experience. There is, therefore, a shortage of qualified staff for installing and maintaining the routing policies. Because of the overall complexity of BGP, policy management tends to be only a relatively small topic within a complete BGP training course and specialised policy management training courses are not generally available. 4.11. Security Issues While many of the issues with BPG security have been traced either to implementation issues or to operational issues, BGP is vulnerable to DDOS attacks. Additionally routers can be used as unwitting forwarders in DDOS attacks on other systems. Though DDOS attacks can be fought in a variety of ways, most filtering methods, it is takes constant vigilance. There is nothing in the current architecture or in the protocols that serves to Doria & Davies Expires April 27, 2006 [Page 34] Internet-Draft IDR History October 2005 protect the forwarders from these attacks. 4.12. Support of MPLS and VPNS Recently BGP has been modified to function as a signaling protocol for MPLS and for VPNs [RFC2547]. Some people see this over-loading of the BGP protocol as a boon whilst others see it as a problem. While it was certainly convenient as a vehicle for vendors to deliver extra functionality for to their products, it has exacerbated some of the performance and complexity issues of BGP. Two important problems are, the additional state that must be retained and refreshed to support VPN tunnels and that BGP does not provide end-to-end notification making it difficult to confirm that all necessary state has been installed or updated. In creating the future domain routing architecture, serious consideration must be given to the argument that VPN signaling protocols should remain separate from the route determination protocols. 4.13. IPv4 / IPv6 Ships in the Night The fact that service providers need to maintain two completely separate networks; one for IPv4 and one for IPv6 has been a real hindrance to the introduction of IPv6. When IPv6 does get widely deployed it will do so without causing the disappearance of IPv4. This means that unless something is done, service providers would need to maintain the two networks in, relative, perpetuity. It is possible to use a single set of BGP speakers with multiprotocol extensions [RFC2858] to exchange information about both IPv4 and IPv6 routes between domains, but the use of TCP as the transport protocol for the information exchange results in an asymmetry when choosing to use one of TCP over IPv4 or TCP over IPv6. Successful information exchange confirms one of IPv4 or IPv6 reachability between the speakers but not the other, making it possible that reachability is being advertised for a protocol for which it is not present. Also, current implementations do not allow a route to be advertised for both IPv4 and IPv6 in the same UPDATE message, because it is not possible to explicitly link the reachability information for an address family to the corresponding next hop information. This could be improved, but currently results in independent UPDATEs being exchanged for each address family. 4.14. Existing Tools to Support Effective Deployment of Inter-Domain Routing Doria & Davies Expires April 27, 2006 [Page 35] Internet-Draft IDR History October 2005 The tools available to network operators to assist in configuring and maintaining effective inter-domain routing in line with their defined policies are limited, and almost entirely passive. o There are no tools to facilitate the planning of the routing of a domain (either intra- or inter-domain); there are a limited number of display tools that will visualize the routing once it has been configured o There are no tools to assist in converting business policy specifications into the RPSL language; there are limited tools to convert the RPSL into BGP commands and to check, post-facto, that the proposed policies are consistent with the policies in adjacent domains (always provided that these have been revealed and accurately documented). o There are no tools to monitor BGP route changes in real time and warn the operator about policy inconsistencies and/or instabilities. The following section summarises the tools that are available to assist with the use of RPSL. Note they are all batch mode tools used off-line from a real network. These tools will provide checks for skilled inter-domain routing configurers but limited assistance for the novice. 4.14.1. Routing Policy Specification Language RPSL (RFC 2622, 2650) and RIPE NCC Database (RIPE 157) Routing Policy Specification Language RPSL enables a network operator to describe routes, routers and autonomous systems ASs that are connected to the local AS. Using the RPSL language a distributed database is created to describe routing policies in the Internet as described by each AS independently. The database can be used to check the consistency of routing policies stored in the database. Tools exist (RIPE 81, 181, 103) that can be applied on the database to answer requests of the form, e.g. o Flag when two neighboring network operators specify conflicting or inconsistent routing information exchanges with each other and also detect global inconsistencies where possible; o Extract all AS-paths between two networks that are allowed by routing policy from the routing policy database; display the connectivity a given network has according to current policies. The database queries enable a partial static solution to the convergence problem. They analyze routing policies of very limited part of Internet and verify that they do not contain conflicts that Doria & Davies Expires April 27, 2006 [Page 36] Internet-Draft IDR History October 2005 could lead to protocol divergence. The static analysis of convergence of the entire system has exponential time complexity, so approximation algorithms would have to be used. 5. Security Considerations As this is an informational draft on the history of requirements in IDR and on the problems facing the current Internet IDR architecture, it does not as such create any security problems. On the other hand, some of the problems with today's Internet routing architecture do create security problems and these have been discussed in the text above. 6. Acknowledgments The draft is derived from work originally produced by Babylon. Babylon was a loose association of individuals from academia, service providers and vendors whose goal was to discuss issues in Internet routing with the intention of finding solutions for those problems. The individual members who contributed materially to this draft are: Anders Bergsten, Howard Berkowitz, Malin Carlzon, Lenka Carr Motyckova, Elwyn Davies, Avri Doria, Pierre Fransson, Yong Jiang, Dmitri Krioukov, Tove Madsen, Olle Pers, and Olov Schelen. Thanks also go to the members of Babylon and others who did substantial reviews of this material. Specifically we would like to acknowledge the helpful comments and suggestions of the following individuals: Loa Andersson, Tomas Ahlstrom, Erik Aman, Thomas Eriksson, Niklas Borg, Nigel Bragg, Thomas Chmara, Krister Edlund, Owe Grafford, Torbjorn Lundberg, Jasminko Mulahusic, Florian-Daniel Otel, Bernhard Stockman, Tom Worster, Roberto Zamparo. In addition, the authors are indebted to the folks who wrote all the references we have consulted in putting this paper together. This includes not only the references explicitly listed below, but also those who contributed to the mailing lists we have been participating in for years. Finally, it is the editors who are responsible for any lack of clarity, any errors, glaring omissions or misunderstandings. 7. References [Blumenthal01] Blumenthal, M. and D. Clark, "Rethinking the design of the Doria & Davies Expires April 27, 2006 [Page 37] Internet-Draft IDR History October 2005 Internet: The end to end arguments vs", the brave new world , May 2001, . [Breslau90] Breslau, L. and D. Estrin, "An Architecture for Network- Layer Routing in OSI", Proceedings of the ACM symposium on Communications architectures & protocols , 1990. [Chapin94] Piscitello, D. and A. Chapin, "Open Systems Networking: TCP/IP & OSI", Addison-Wesley Copyright assigned to authors, 1994, . [Chiappa91] Chiappa, N., "A New IP Routing and Addressing Architecture", , 1991. [Griffin99] Griffin, T. and G. Wilfong, "An Analysis of BGP Convergence Properties", , 1999. [Huitema90] Huitema, C. and W. Dabbous, "Routeing protocols development in the OSI architecture", Proceedings of ISCIS V Turkey, 1990. [Huston05] Huston, G., "Exploring Autonomous System Numbers", The ISP Column , August 2005, . [I-D.alaettinoglu-isis-convergence] Alaettinoglu, C., Jacobson, V., and H. Yu, "", draft-alaettinoglu-isis-convergence-00 (work in progress), Nov 2000. [I-D.berkowitz-multirqmt] Berkowitz, H. and D. Krioukov, "To Be Multihomed: Requirements and Definitions", draft-berkowitz-multirqmt-02 (work in progress), 2002. [I-D.ietf-idr-as4bytes] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS Number Space", draft-ietf-idr-as4bytes-11 (work in progress), September 2005. [I-D.ietf-idr-bgp4] Doria & Davies Expires April 27, 2006 [Page 38] Internet-Draft IDR History October 2005 Rekhter, Y., "A Border Gateway Protocol 4 (BGP-4)", draft-ietf-idr-bgp4-26 (work in progress), October 2004. [I-D.irtf-routing-reqs] Doria, A., "Requirements for Inter-Domain Routing", draft-irtf-routing-reqs-03 (work in progress), July 2004. [I-D.lang-mpls-lmp] Lang, "Link Management Protocol", draft-lang-mpls-lmp-02 (work in progress), 2002. [I-D.mcpherson-bgp-route-oscillation] McPherson, D., Gill, V., Walton, D., and A. Retana, "BGP Persistent Route Oscillation Condition", draft-mcpherson-bgp-route-oscillation-01 (work in progress), Jan 2001. [I-D.mcpherson-ospf-transient] McPherson, D. and T. Przygienda, "OSPF Transient Blackhole Avoidance", draft-mcpherson-ospf-transient-00 (work in progress), Jul 2000. [I-D.sandiick-flip] Sandick, H., Squire, M., Cain, B., Duncan, I., and B. Haberman, "Fast LIveness Protocol (FLIP)", draft-sandiick-flip-00 (work in progress), Feb 2000. [INARC89] Mills, D., Ed. and M. Davis, Ed., "Internet Architecture Workshop: Future of the Internet System Architecture and TCP/IP Protocols - Report", Internet Architecture Task Force INARC, 1990, . [ISO10747] ISO/IEC, "Protocol for Exchange of Inter-Domain Routeing Information among Intermediate Systems to support Forwarding of ISO 8473 PDUs", International Standard 10747 , 1993. [Jiang02] Jiang, Y., Doria, A., Olsson, D., and F. Pettersson, "Inter-domain Routing Stability Measurement", , 2002, . [Labovitz02] Labovitz, C., Ahuja, A., Farnam, J., and A. Bose, "Experimental Measurement of Delayed Convergence", NANOG , 2002. Doria & Davies Expires April 27, 2006 [Page 39] Internet-Draft IDR History October 2005 [RFC0904] Mills, D., "Exterior Gateway Protocol formal specification", RFC 904, April 1984. [RFC0975] Mills, D., "Autonomous confederations", RFC 975, February 1986. [RFC1105] Lougheed, K. and J. Rekhter, "Border Gateway Protocol (BGP)", RFC 1105, June 1989. [RFC1126] Little, M., "Goals and functional requirements for inter- autonomous system routing", RFC 1126, October 1989. [RFC1163] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol (BGP)", RFC 1163, June 1990. [RFC1267] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol 3 (BGP-3)", RFC 1267, October 1991. [RFC1752] Bradner, S. and A. Mankin, "The Recommendation for the IP Next Generation Protocol", RFC 1752, January 1995. [RFC1753] Chiappa, J., "IPng Technical Requirements Of the Nimrod Routing and Addressing Architecture", RFC 1753, December 1994. [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [RFC1992] Castineyra, I., Chiappa, N., and M. Steenstrup, "The Nimrod Routing Architecture", RFC 1992, August 1996. [RFC2547] Rosen, E. and Y. Rekhter, "BGP/MPLS VPNs", RFC 2547, March 1999. [RFC2791] Yu, J., "Scalable Routing Design Principles", RFC 2791, July 2000. [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. [RFC3221] Huston, G., "Commentary on Inter-Domain Routing in the Internet", RFC 3221, December 2001. [RFC3913] Thaler, D., "Border Gateway Multicast Protocol (BGMP): Protocol Specification", RFC 3913, September 2004. [Tsuchiya87] Tsuchiya, P., "An Architecture for Network-Layer Routing Doria & Davies Expires April 27, 2006 [Page 40] Internet-Draft IDR History October 2005 in OSI", Proceedings of the ACM workshop on Frontiers in computer communications technology , 1987. [Xu97] Xu, Z., Dai, S., and J. Garcia-Luna-Aceves, "A More Efficient Distance Vector Routing Algorithm", Proc IEEE MILCOM 97, Monterey, California, Nov 1997, . Doria & Davies Expires April 27, 2006 [Page 41] Internet-Draft IDR History October 2005 Authors' Addresses Avri Doria LTU Lulea, 971 87 Sweden Phone: +1 401 663 5024 Email: avri@acm.org Elwyn B. Davies Consultant Soham, Cambs UK Phone: +44 7889 488 335 Email: elwynd@dial.pipex.com Doria & Davies Expires April 27, 2006 [Page 42] Internet-Draft IDR History October 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Doria & Davies Expires April 27, 2006 [Page 43]