Internet Engineering Task Force Olivier Bonaventure INTERNET DRAFT Steve Uhlig Bruno Quoitin UCL July, 2004 The case for more versatile BGP Route Reflectors Status of this Memo By submitting this Internet-Draft, we certify that any applicable patent or other IPR claims of which we are aware of have been disclosed, and any of which we become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 31, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract The Border Gateway Protocol (BGP) is the standard interdomain routing protocol in the Internet. Inside an Autonomous System (AS), the interdomain routes are often distributed by using BGP Route Reflectors (RR). Today, most RR are simple BGP routers. We show that by adding intelligence to the RR, it is possible to improve both the Bonaventure/Uhlig/Quoitin [Page 1] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 routing and the packet forwarding in ASes. We show how a versatile RR can help an AS to engineer the flow of its incoming or outgoing interdomain traffic. We also discuss how a versatile RR could help to reduce the BGP convergence time or reduce the size of the routing tables when providing BGP/MPLS VPN services. 1 Introduction The Border Gateway Protocol (BGP) [1] is used today by more than 16.000 Autonomous Systems (AS) to exchange their interdomain routes. The stability and performance of BGP are key factors for the stability and performance of the global Internet. Although BGP suffers from a low convergence in case of failure and some BGP routers tend to transmit too many routing messages, recent studies have shown that BGP routing is stable [2, 3], at least when considering the routes towards destinations receiving lots of packets. BGP is also used inside many ISPs to distribute several other types of information such as BGP/MPLS VPN routes [4] or flow specifications [5]. When used for interdomain routing, BGP relies on two types of sessions that are established over TCP connections. Two BGP routers from different domains connected with a physical link will use an eBGP session to exchange their interdomain routes. The interdomain routes received by the border routers of an AS need to be propagated through the AS. This is usually done by relying on iBGP sessions. The initial BGP specification assumed that a full-mesh of iBGP sessions would be established inside each AS to distribute the interdomain routes. A consequence of this full-mesh of iBGP sessions is that a BGP router will not distribute over an iBGP session a route received over another iBGP session. However, this full-mesh quickly appeared unscalable since an AS with N routers needs to support N Ú (N-1)/2 iBGP sessions. Two solutions have been proposed to solve this scaling problem. With the confederation approach, each AS is divided into smaller sub- ASes containing each about a few tens of routers. Inside each sub-AS, a full mesh of iBGP sessions are established between the routers of the sub-AS and special eBGP sessions are used between routers of different sub-ASes. A second approach, which, based on discussions with ISP operators, appears to be more often used by large ASes, is to rely on BGP Route Reflectors (RR) [6]. A RR is a special BGP router which is allowed to redistribute over iBGP sessions routes that it has received over some iBGP sessions. A RR has two types of iBGP peers : its client-peers and its non-client peers. The non- client peers are usually other RR. A RR will receive routes from all its iBGP peers and will use its BGP decision process and its IGP table to determine the best routes to reach each destination. If the Bonaventure/Uhlig/Quoitin [Page 2] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 best route was received on an iBGP session with a client peer, it will be advertised to all the iBGP peers. On the other hand, if the route was received from a non-client peer it will only be advertised to client-peers. The number of RRs in an IP network is much smaller than the number of routers [7]. A network with several tens of routers would typically have one (or two for redundancy reasons) RR. Larger networks with several hundred of routers in various countries may use up to a few tens of RRs connected in a full mesh or with a RR hierarchy. Discussions with ISPs indicate that there are three ways to deploy RR. The first solution is to place the RR function on existing backbone routers. In this case, the router needs to have enough CPU and memory capabilities to support the RR function while handling its normal load. Another approach is to use a dedicated router that does not forward IP packets but is equipped with a good CPU and large memory. Finally, smaller ASes sometimes rely on PCs or workstations running open-source RRs. In most deployments of RRs today, the goal is often to minimise the CPU load on the RR. RRs are often only considered as a way to solve the iBGP distribution problem. In this paper, we assume that the RR service is provided by a carrier-class workstation or a cluster of workstations where the CPU and the memory are not as limited as on current routers. We show in this article that by correctly exploiting the knowledge of the RR, it is possible to provide new services both inside and between ASes. We discuss several examples that could each lead to entire papers on the topic. We show in section 2 that a more intelligent RR could avoid the forwarding loops that may occur with a badly placed current RR. Then, in section 3 we show how a versatile RR could allow a transit AS to efficiently engineer its interdomain traffic. Finally, in section 4, we discuss the role that versatile RRs could play in ASes using MPLS to support VPN services or interdomain LSPs. 2 Limitations of current RR The currently deployed RRs advertise their own best route to each of their client peers. This allows the RR to compute a single best route, but this creates several problems. The first problem is that routing and even forwarding loops can occur when RR are used. Several of those problems have been described in the literature [8] and reported in real networks [9]. As an example, consider the topology shown in Figure 1 based on [8]. The arrows show the BGP sessions. The IGP weight of each physical link is also shown. In this network, RX and RY advertise the prefix P. The two RRs prefer the route learned via eBGP and Bonaventure/Uhlig/Quoitin [Page 3] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 advertise it to their client. If R1 receives a packet destined to P, its BGP table forces it to send it to RR1. However, the IGP topology will cause the packet to be sent to R2. R2's BGP table forces it to send the packet to RR2, but to reach this nexthop, R2 will send the packet via R1 ... Figure 1: Simple network topology with a forwarding loop Extensions to BGP [10] have been proposed to solve this problem, but they are not implemented and deployed. The current solution is to apply guidelines when designing iBGP topologies [11, 12, 7]. Those guidelines impose restrictions on the graph of the iBGP sessions. Those restrictions depend on the IGP topology and the location of the RR. In practice, the IGP topology changes frequently as links or routers fail or are added to the network or when traffic engineering tools are used to engineer the intradomain traffic by setting the IGP weights [13]. Ensuring that the guidelines are preserved after each IGP change is not an easy task. If the CPU of the RR is not a severe bottleneck, a solution to avoid the routing and forwarding loops induced by RRs would be to change the behaviour of the RR. Instead of computing its own best route which is then distributed to all its clients, a RR could compute the best route that would be computed by each client if it had the same BGP table as the RR. Since one step of the BGP decision process uses the IGP distance between the router and the nexthop contained in each BGP route, the RR would need to know the IGP distance between each of its clients and each BGP nexthop. This information can be obtained by computing the IGP table of each client or by defining a new protocol to allow a client to report this information to its RR [14]. If the RR recomputes the IGP tables of its clients, they need to be updated after each IGP change. Several algorithms [15] have been proposed to incrementally update the routing table of a router after a topology change. There are also incremental versions of the all-pairs shortest paths algorithms [16]. Based on those algorithms, it should be possible to build incremental algorithms to determine the BGP updates to be sent to the clients of a RR after a BGP or an IGP change. Another issue with RR is the convergence time in case of failure. Consider again the network topology shown in Figure 1. Assume that the bottom AS is a provider advertising prefix P at both RX and RY. Assume that the forwarding loop problem mentioned above has been solved by forcing each RR to compute the best route for each client. In this case, R1 sends its packet to P via RR2. If the link RR2-RY fails, RR2 would withdraw its route to P on its iBGP session with RR1. RR1 would then send a new route to R1. If instead of one prefix Bonaventure/Uhlig/Quoitin [Page 4] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 we consider that 100,000 routes used by R1 pass via RR2. Then, when the RR2-RY eBGP session fails, RR2 needs to withdraw 100,000 routes on the RR2-RR1 iBGP session. RR1 would then need to update the 100,000 routes on the RR1-R1 iBGP session. This could take several seconds or more depending on the performance of the RR. If instead a full-mesh of iBGP sessions was used in this network, R1 would have received all the eBGP routes learned by RR2 and RR1. When the failure of link RR2-RY is reported by the IGP, R1 could consider all the routes via RR2 as unreachable and could switch to the routes learned from RR1. In large ISPs with a hierarchy of RRs, the impact of the RRs on the BGP convergence time may be even larger. With today's stringent SLAs, there is a clear need to reduce the convergence time in case of failures. A versatile RR could help to reduce it by advertising several routes to its clients. Knowing the IGP table of each of its clients, the RR can easily determine the best BGP route, but also the second route that it would select if the first become unreachable. By using the BGP extensions proposed in [10], the RR could advertise the best and the second route to each client. This would ensure that the client can quickly switch to a new route when the primary one becomes unreachable. This solution could probably be even more useful in networks providing RFC2547 BGP/MPLS VPN services given their tight SLA constraints. 3 RR-assisted traffic engineering Another important problem in the global Internet is the need to perform traffic engineering. Several solutions to engineer the flow of the IP packets in the network are used. Some tune the intradomain traffic by setting of the IGP weights [13] or establishing MPLS LSPs [17]. Another problem is to engineer the flow of the inter-domain traffic. As mentioned in [17], ``inter-domain Internet traffic engineering is crucial to the performance enhancement of the global Internet infrastructure.'' However, inter-domain traffic engineering today often relies on tweaking the configurations of the routers [18, 19] and is often more an art than science. 3.1 Reference environment To perform traffic engineering, a RR needs two types of information. Traffic statistics constitute the first type of information. For intradomain traffic engineering, those statistics are collected as POP-POP or router-router traffic matrices. For interdomain traffic engineering purposes, more precise statistics are required at the granularity of the BGP routes. However, in practice, accurate statistics for each route are not required [20]. Studies of the traffic characteristics in different networks [18, 19] have shown that a small number of prefixes are responsible for most of the Bonaventure/Uhlig/Quoitin [Page 5] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 traffic. We assume in this paper that per-BGP route volume statistics are maintained by the border routers and sent to the RR for those heavy prefixes. The second type of information required to engineer the flow of the interdomain traffic are the routing tables of the border routers. A RR is ideally placed to obtain this information since it already collects the BGP routes and participates in the IGP. Figure 2 provides the typical network environment for RR-based traffic engineering. At regular time intervals or when some routing change occur, the RR computes the best route that each ingress BGP router should use to reach a particular destination prefix outside the AS. Figure 2: Typical configuration of traffic engineering RR. When considering interdomain traffic engineering, we need to distinguish between the control of the outbound traffic and the control of the inbound traffic. 3.2 Outbound interdomain traffic engineering Let us first consider the case of a stub AS that needs to engineer its outgoing traffic to a few transit providers. In principle, this engineering is simple since the network operator can define filters on all its border routers to prefer some upstream provider for some prefixes. However, the size of the BGP routing tables (more than 140.000 routes today) make the search of the ideal configuration difficult [21]. Furthermore, the traffic pattern changes regularly [3] and thus a perfect configuration at time t may become inconvenient at time t+1. In [22, 23], we have shown that by using intelligent route reflectors, it is possible to engineer the flow of the outbound interdomain traffic even when the traffic patterns changes with time. The principles of the solution described in [22, 23] can be summarised as follows. First, the RR collects traffic statistics regularly as explained above. Second, the RR receives all the routes from the stub's providers. This can be obtained by establishing multi-hop eBGP sessions between the RR and the border router of each provider. Another solution is the BGP extension proposed in [10] to force the stub's border routers to advertise all their routes and not only their best routes. To control the flow of the outgoing traffic, the RR simply has to control the iBGP advertisements that it sends to the stub's border routers. Based on this routing and traffic information, the RR regularly runs an evolutionary algorithm. This algorithm can be configured Bonaventure/Uhlig/Quoitin [Page 6] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 with different objective functions such as balancing the traffic among providers, reducing the total cost based on the billing used, .... To fulfil the objective function, the evolutionary algorithm will select from time to time, a few prefixes to be moved from one provider to another. We have shown in [23] that load balancing was possible with only a few iBGP messages per minute, an iBGP load much lower than the normal load of BGP messages in the global Internet. The RR-based traffic engineering method described above is also applicable for transit ASes. A more detailed description of this approach may be found in [24, 25]. In a stub AS, changing iBGP advertisements is possible since the impact of those advertisements is limited to the stub AS. In a transit AS such as GEANT, an iBGP change can lead sometimes to eBGP changes that could force peers to change their best BGP route. To prevent the BGP route changes to generate instabilities in the rest of the Internet, aggregation could be used by the local AS so that changes in the egress point within the AS do not impact customer ASes. Figure 3 illustrates the use of aggregation to prevent frequent BGP route changes that do not impact the actual path followed by IP packets for the ASes upstream from the flow of the traffic. Suppose that the route reflector RR decides to change the egress point to reach the external prefix A.B.C.D/Y, from egress E1 to egress E2 for ingress point I1. Under normal conditions, whenever ingress I1 changes its best BGP route to reach prefix A.B.C.D/Y, it requires that a new BGP route be advertised by I1 to the external BGP peers. To prevent I1 to have to advertise a new BGP route every time the best egress point to be used by I1 changes, I1 could advertise to its upstream customers an aggregated AS path. This AS path would contain the set of ASes present in the two BGP routes that could be used by I1 to reach the destination prefix A.B.C.D/Y, as illustrated by Figure 3. In practice, the RR does not need to aggregate the AS-Paths of all the possible routes to a destination, only the routes that it could select with its modified decision process. Often, the best routes for a given destination will be learned from the same peer over different peering sessions. In this case, the aggregation is trivial since all the routes have the same AS-Path. Figure 3: AS path aggregation by RR. The solution described above could be extended to larger transit ASes that contain more than one (pair of) RR. This would require the definition of protocols that allow RR to exchange routing information and traffic statistics and coordination mechanism between the RRs. For instance, one could choose that each RR is responsible for the Bonaventure/Uhlig/Quoitin [Page 7] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 ingress routers it has an iBGP session with. Each RR would then compute the best route for its ingress routers towards each destination prefix and send them to these ingress routers. For this solution to be scalable in terms of the BGP advertisements, each RR would advertise to all other RR's of the domain aggregated AS paths. 3.3 Inbound Interdomain Traffic Engineering Engineering a domain's incoming traffic with BGP is a difficult task [19, 26, 18]. Indeed engineering the incoming traffic of one domain requires the ability to influence how distant domains will select the route that they use to send packets towards the domain. Different techniques exist: announcing more specific prefixes, making selective announcements, prepending the AS-Path and using redistribution communities [19] [27]. However, these methods suffer from several drawbacks. The first two methods increase the size of the BGP routing tables of all routers. AS-Path prepending, while being a widely used method, is known to be coarse and unpredictable. Finally, the redistribution communities are difficult to setup due to the combinatorial explosion of possibilities and the inaccurate view of the topology and policies one has from a single domain's point of view [27]. In this section, we show that a more deterministic approach to engineering the flow of the incoming traffic is possible. Our method relies on a cooperation between the source and destination domains and results in the establishment of interdomain tunnels. A destination domain willing to control how it is reached by a source domain requests the source to establish a tunnel to one of its border router. The tunnel is then used by the source to forward the packets destined to the destination domain. In this way, the packets sent by the given source enter the destination's network through the desired access link. To explain our approach, let us consider the example topology shown in Figure 4. AS1 is a stub that wants to control how it is reached by source AS2. On the figure, we can see that there exists multiple interdomain paths between AS2 and AS1. With the normal BGP, the packets from AS2 reach AS1 via router RD1. Figure 4: Inbound TE using tunnels. Assume that to reduce the delay or balance its incoming traffic, AS1 wishes to receive the packets sent by AS2 via ProviderB and thus router RD2. For this, AS1 will request AS2 to establish a tunnel with destination RD2 to reach all its prefixes. For this, we propose that a route-reflector RD1 inside AS1 establishes an eBGP session Bonaventure/Uhlig/Quoitin [Page 8] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 with a route-reflector, RS3, in the source domain AS2. This multihop eBGP session could be established manually as in the case of peering links or more dynamically. To allow a dynamic establishment of those sessions, AS2 must advertise the address of its route-reflector that needs to be contacted. This address can be encoded as an extended community value attached to the route(s) advertised by AS2. To avoid security issues, the multi-hop eBGP session should be established over an IPSec tunnel that provides authentication, data integrity and anti-replay. Moreover, BGP extensions such as S-BGP [28] or soBGP [29] should be used to check the validity of the prefixes advertised by RD1. The destination domain will typically advertise its own prefixes over the multi-hop eBGP session and the source domain will not advertise any prefix. Each BGP advertisement will also contain a flexible community value [30] indicating the tunnel endpoint in the destination domain (RD2 in our example), the type of tunnel to be used (L2TP, GRE, ...) and possible tunnel parameters such as cookies or identifiers. Instead of using flexible communities, another possibility would be to use MP_BGP and to carry tunnel related information in the MP_REACH_NLRI and a tunnel-SAFI as proposed in [31]. By using as tunnel end-point the IP address of RD2 on the link with ProviderB in figure 4, the destination domain can control the ingress link over which the packets will arrive provided that this address is advertised by ProviderB. When RS3 has received the route towards the network of AS1 over the multi-hop eBGP session, it will select which router(s) will establish the requested tunnel(s) towards the tunnel end-point. It must also update the routes that it distributes with iBGP inside AS2 to ensure that the packets towards AS1 will be forwarded to the tunnel head- end in AS2. Prior to establishing one tunnel towards AS1, RS3 needs to check that the tunnel end-point is reachable by verifying that it has received at least one BGP route to reach it. Depending on the connectivity of AS2, RS3 may choose to establish one or several tunnels to reach the endpoint. Since RS3 is a route-reflector, it has the most complete knowledge of the available routes towards the tunnel endpoint. RS3 will typically select AS2's best egress router to reach the endpoint as the head-end of the tunnel. Note that the selection may depend on other criteria such as the availability of special hardware to perform the required encapsulation on the routers. In order to ask a client to establish a tunnel towards RD2, RS3 sends to this client an iBGP update containing the tunnel attributes. Upon reception of this update, the client establishes the tunnel. Once the tunnel is up and running, it updates its routing table and sends iBGP advertisements to announce the new route in AS2. In the case of a stub source domain, the above procedure will only cause iBGP changes. On the contrary, if the source domain is a transit AS, the new routes using the tunnel could be advertised Bonaventure/Uhlig/Quoitin [Page 9] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 outside the domain. In this case, the BGP updates that are advertised outside the source domain AS2 should have an AS-Path that is composed first of AS2 itself, followed by the AS-Path of the route followed by the tunnel and finally, the destination domain AS1. Using such a path is necessary to allow BGP to continue to detect loops by using the AS-Path attribute. Indeed, without this AS-Path, a transit domain ASX could select AS2 has its next-hop to reach AS1 while the tunnel used by AS2 passes through ASX. The traffic would then pass twice through the same domain which would be a waste of ressources. IP tunnels such as GRE or L2TP have been often been criticised because of the cost of encapsulating/decapsulating packets and the risk of fragmentation. The first problem is not anymore an issue since several vendors offer interfaces supporting encapsulation/decapsulation at line rate. With Packet over SONET/SDH links, the MTU is less a problem given the available frame size. Furthermore PathMTU discovery is used by almost all endsystems today/widely deployed and used. Compared to other proposals such as [26], the solution described above can be used without deploying new protocols in the transit domains. For example, universities or research networks could use it to control high-bandwidth flows. 4 Route Reflectors and MPLS Many large ISPs are currently using MPLS to provide BGP/MPLS VPN services to their corporate customers [4]. Today, those services are often provided within a single AS. Three types of routers are usually distinguished in a network providing BGP/MPLS VPN services. A CE router is a router owned and maintained by a customer. A PE router is a router maintained by the network provider and directly attached to a CE router. A PE router will usually learn the routes reachable via each of its attached CE routers through a special IGP or BGP session [4]. To isolate all the different VPNs, a PE router will maintain one VPN Routing and Forwarding table (VRF) for each supported VPN. BGP is used by the PE routers to distribute the content of their VRF to other PE that are attached to the same VPN customers. The forwarding of VPN packets from one PE to another relies on the utilisation of MPLS, GRE or IPSec tunnels. Thanks to the utilisation of those tunnels, the core routers, also called P routers, do not need to maintain per-VPN VRFs. Since BGP is used to distribute the VPN routes inside the network, RR are often used to scale the iBGP full-mesh between the PE routers. Thanks to the routeviews and RIPE RIS projects, the behaviour of BGP in the global Internet has received a lot of attention and BGP is better known than a few years ago. Despite of that, few studies have analysed BGP/MPLS VPNs. A recent study [32] revealed that the behaviour of BGP is very different when considering VPN services than Bonaventure/Uhlig/Quoitin [Page 10] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 when considering the global Internet. A first difference is the size of the routing tables. In the global Internet, few routes are more specific than /24 and the BGP routes are very stable. In the BGP/MPLS network analysed in [32], the situation is completely different. First, the BGP/MPLS routing table is already larger than the Internet BGP routing table and is growing quickly. Second, the BGP/MPLS VPN routing table contains much more specific prefixes than the BGP Internet routing table. Figure 5, based on [32] compares the percentage of routes for the most common prefix lengths. This figure shows that 55% of the Internet routes are for /24 prefixes and other common prefix sizes are /16 to /23. In the BGP/MPLS VPN routing tables, 38% of the routes have a /32 IPv4 prefix as destination and 9% correspond to a /30 prefix. A first consequence of those specific routes is that in the network studied in [32], the BGP/MPLS VPN routing table in the RR is already larger than the BGP Internet routing table and the BGP/MPLS routing table. Another consequence is that the BGP/MPLS VPN routes are less stable and the BGP messages are much more frequent in the BGP/MPLS network [32]. Figure 5: Prefix distribution in the BGP Internet and BGP/MPLS VPN routing tables The size of the BGP/MPLS routing tables will force operators to utilise route aggregation mechanisms for the BGP/MPLS VPNs. The default BGP aggregation [1] is able to aggregate routes for contiguous prefixes coming from different ASes in a single advertisement. This technique could be applied by the customers on the CE routers. However, a CE router could only aggregate its local routes. A versatile RR, receiving VPN routes from several PE routers could perform a better aggregation by considering all the routes inside each VPN. Given the volatility of some BGP/MPLS routes, the RR would need to be able to change an aggregate dynamically after an event in a customer network. The next step for the BGP/MPLS VPNs is to provide those services across different ASes. Several solutions are proposed in [4]. One of the possible solutions is to directly interconnect the RR of different ASes with a multi-hop eBGP session to distribute the inter-provider VPN routes. In this case, the RR should clearly aggregate the VPN routes that it sends over the multi-hop eBGP session. Another problem with BGP/MPLS VPNs is that important VPN sites are often attached to two different PE routers. This dual attachment is often required for redundancy, but once the two links are established, customers often require to be able to use them for both inbound and outbound traffic. For the packets sent by the CE router Bonaventure/Uhlig/Quoitin [Page 11] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 to the network provider, this depends only on the customer network. For the packets sent by the VPN provider towards the CE router, the ability to load-balance the traffic between the two PE routers depends on the configuration of PE routers of the VPN provider. A possible solution is to use per-site route distinguishers [4] to ensure that each PE receives all the advertisements from all the PE routers attached to the same VPN. However, this increases the size of the BGP/MPLS routing tables. A versatile route reflector could be configured to advertise a single route when scalability is important and several routes, for example by using the BGP extensions proposed in [10], for the VPNs sites where load-balancing must be achieved. Another situation where RR could play a role in MPLS networks is when interdomain LSPs [33] need to be established with RSVP-TE. To establish a LSP with RSVP-TE, the head-end Label Switching Router (LSR) computes an explicit route. In a single IGP area, this computation relies on the topology distributed by the IGP. Across interdomain boundaries, this computation becomes more difficult since BGP distributes reachability and not topological information. For a primary LSP, the head-end LSR could use the route distributed by BGP. For a disjoint secondary LSP, this becomes more difficult as the head-end usually only receive the best BGP route to each destination. A RR that collects all the candidate routes learned via BGP could select among those routes to find a disjoint route for the secondary LSP. 5 Conclusion BGP Route Reflectors were designed to solve the scaling problem of the iBGP full-mesh. For this, the RR collects the best routes from all its clients. Instead of only serving as a distributor of iBGP advertisements, we have shown that by exploiting the routing knowledge of the RR it is possible to improve the routing in ASes. We have then shown several situations where versatile RR could be used to support very useful services in Autonomous Systems. One of those situations is the need to engineer the flow of the outgoing interdomain traffic of a stub or transit AS. Another situation occurs when an AS wishes to control the flow of its incoming traffic. Besides those traffic engineering usages, versatile RR could also be used to reduce the convergence time in case of failure or the size of the BGP/MPLS routing tables. Acknowledgements This work was supported by the DGTRE in the framework of the TOTEM project ( http://totem.info.ucl.ac.be). We would like to thank Nicolas Dubois for the data used in figure 5. Bonaventure/Uhlig/Quoitin [Page 12] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 References [1] Y. Rekhter, T. Li, and S. Hares, ``A Border Gateway Protocol 4 (BGP-4),'' April 2003, internet draft, draft-ietf-idr- bgp4-20.txt, work in progress. [2] J. Rexford, J. Wang, Z. Xiao, and Y. Zhang, ``Bgp routing stability of popular destinations,'' in Proc. Internet Measurement Workshop, November 2002. [3] S. Uhlig, V. Magnin, O. Bonaventure, C. Rapier, and L. Deri, ``Implications of the topological properties of internet traffic on traffic engineering,'' in ACM Symposium on Applied Computing, March 2004. [4] E. Rosen and Y. Rekhter, ``BGP/MPLS IP VPNs,'' September 2003, internet draft, draft-ietf-l3vpn-rfc2547bis-01.txt, work in progress. [5] P. Marques, N. Sheth, R. Raszuk, J. Mauch, and D. McPherson, ``Dissemination of flow specification rules,'' June 2003, internet draft, draft-marques-idr-flow-spec-00.txt, work in progress. [6] T. Bates, R. Chandra, and E. Chen, ``BGP route reflection - an alternative to full mesh iBGP,'' April 2000, internet RFC 2796. [7] B. Halabi, Internet Routing Architectures.1em plus 0.5em minus 0.4emCisco Press, 1997. [8] T. Griffin and G. Wilfong, ``Analysis of the MED oscillation problem in BGP,'' in ICNP2002, 2002. [9] D. McPherson, V. Gill, D. Walton, and A. Retana, ``BGP persistent route oscillation condition,'' 2002, internet draft, draft-ietf-idr-route-oscillation-01.txt, work in progress. [10] D. Walton, D. Cook, A. Retana, and J. Scudder, ``Advertisement of Multiple Paths in BGP,'' November 2002, internet draft, draft-walton-bgp-add-paths-01.txt, work in progress. Bonaventure/Uhlig/Quoitin [Page 13] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 [11] T. Griffin and G. Wilfong, ``On the correctnes of iBGP configuration,'' in SIGCOMM'02, Pittsburgh, PA, USA, August 2002, pp. 17--29. [12] L. Xiao, J. Wang, and K. Nahrstedt, ``Reliability-aware IBGP Route Reflection Topology Design,'' in 11th IEEE International Conference on Network Protocols (ICNP 2003), Atlanta, Georgia, USA, November 2003. [13] B. Fortz, J. Rexford, and M. Thorup, ``Traffic engineering with traditional IP routing protocols,'' IEEE Communications Magazine, October 2002. [14] R. Musunuri and J. Cobb, ``A complete solution to stable iBGP,'' in IEEE International Conference on Communications (ICC), 2004. [15] C. Alaettinoglu, V. Jacobson, and H. Yu, ``Towards millisecond IGP congergence,'' November 2000, internet draft, draft-alaettinoglu-ISIS-convergence-00.ps, wor k in progress. [16] C. Demetrescu and G. F. Italiano, ``A New Approach to Dynamic All Pairs Shortest Paths,'' in Proceedings of the 35th ACM symposium on Theory of computing (STOC'03), June 2003, pp. 159--166. [17] D. Awduche, A. Chiu, A. Elwalid, I. Widjaja, and X. Xiao, ``Overview and principles of internet traffic engineering,'' May 2002, rFC 3272. [18] N. Feamster, J. Borkenhagen, and J. Rexford, ``Guidelines for interdomain traffic engineering,'' SIGCOMM Comput. Commun. Rev., vol. 33, no. 5, pp. 19--30, 2003. [19] B. Quoitin, S. Uhlig, C. Pelsser, L. Swinnen, and O. Bonaventure, ``Interdomain traffic engineering with BGP,'' IEEE Communications Magazine, May 2003. [20] S. Leinen, ``Evaluation of candidate protocols for IP flow information ex port (IPFIX),'' January 2004, internet draft, draft-leinen-ipfix-eval-contrib-02, work in pr ogress. [21] T. Ye and S. Kalyanaraman, ``A recursive random search algorithm Bonaventure/Uhlig/Quoitin [Page 14] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 for large-scale network parameter configuration,'' in Proc. of ACM SIGMETRICS, 2003. [22] S. Uhlig, O. Bonaventure, and B. Quoitin, ``Interdomain Traffic Engineering with minimal BGP Configurations,'' in Proc. of the 18^th International Teletraffic Congress, Berlin, September 2003. [23] S. Uhlig, ``Implications of the traffic characteristics on interdomain tra ffic engineering,'' Ph.D. dissertation, Computer Science and Engineering Department, Unive rsit‰ catholique de Louvain, March 2004. [24] ------, ``A multiple-objectives evolutionary perspective to interdomain traffic engineering in the internet,'' in Workshop on Nature Inspired Approaches to Networks and Telecommunications (NIANT) in PPSN04, Birmingham, UK, September 2004. [25] S. Uhlig and B. Quoitin, ``BGP-based interdomain traffic engineering for transit ASes.'' [26] S. Agarwal, C. Chuah, and R. Katz, ``OPCA: Robust Interdomain Policy Rrouting and Traffic Control,'' in Proceedings of the 6th International Conference on Open Architecture and Network Programming, IEEE OpenArch, April 2003. [27] B. Quoitin, S. Tandel, S. Uhlig, and O. Bonaventure, ``Interdomain Traffic Engineering with Redistribution Communities,'' Computer Communications, vol. 27, no. 4, pp. 355--363, March 2004. [28] S. Kent, C. Lynn, and K. Seo, ``Secure Border Gateway Protocol (S-BGP),'' IEEE Journal on Selected Areas in Communications, vol. 18, no. 4, pp. 582--592, April 2000. [29] R. White, ``Securing BGP Through Secure Origin BGP,'' The Internet Protocol Journal, vol. 6, pp. 15--22, June 2003. [30] A. Lange, ``Flexible BGP Communities,'' March 2004, internet draft, draft-lange-flexible-communities-02, work in progress. [31] G. Nalawade, R. Kapoor, and D. Tappan, ``Tunnel SAFI,'' October 2003, internet Draft, draft-nalawade-kapoor-tunnel-safi-01, work in Bonaventure/Uhlig/Quoitin [Page 15] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 progress. [32] M. Nicolas, ``BGP/MPLS VPN monitoring for troubleshooting, scalability verification and network migration safety,'' February 2004, presentation at MPLS2004 , Paris (France). [33] R. Zhang and J. Vasseur, ``MPLS Inter-AS traffic engineering requirements,'' November 2003, internet draft, draft-ietf-tewg-interas-mpls-te-req-02.txt, work in progress. Authors' addresses Olivier Bonaventure, Steve Uhlig, Bruno Quoitin Dept. Computing Science and Engineering Universite catholique de Louvain (UCL) Place Sainte-Barbe 2 B-1348 Louvain-la-Neuve Belgium http://www.info.ucl.ac.be/people/OBO Bonaventure/Uhlig/Quoitin [Page 16] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This Internet-Draft will expire on December 31, 2004. Acknowledgment Bonaventure/Uhlig/Quoitin [Page 17] draft-bonaventure-bgp-route-reflectors-00.txt July 2004 Funding for the RFC Editor function is currently provided by the Internet Society. Bonaventure/Uhlig/Quoitin [Page 18]