Internet Engineering Task Force                              Rohit Dube
Internet Draft                           Bell Labs, Lucent Technologies
Expiration Date: May 1999                               John G. Scudder
                                        Internet Engineering Group, LLC

                    Route Reflection Considered Harmful

                draft-dube-route-reflection-harmful-00.txt


1. Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To view the entire list of current Internet-Drafts, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
   Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
   Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).


2. Abstract

   Route reflection as defined by [2] is a popular way of reducing the 
   full-mesh IBGP peering required by routers running the Border Gateway
   Protocol [1]. There are cases where a topology built using route 
   reflectors produces persistent loops or does not produce the same 
   results as what one would expect with a full IBGP mesh. This document
   describes these problems. 
  


3. Introduction

   Route reflectors by design are selective as to which routes they 
   forward to their peers (i.e. reflect). Specifically, if many routes 
   to the same NLRI are available, a route reflector will reflect only 
   the route it has selected for its own use. Typically this reduces the
   number of routes each peer in the AS must store in its RIB as well as
   the volume of BGP update traffic.  By this very nature of route 
   reflection, every peer in the network doesn't have a full view of all
   the routes to a prefix to choose from. This coupled with the 
   specifics of BGP causes problems as we now describe.




Dube, Scudder                                                 [Page 1]

Internet Draft                                             November 1998


4. Persistent Loops

   Consider the topology in Figure 1.

                      +----------------------+
                      | +------------+       |
                      | |            |       |
               E1=====RR1=====R3=====R4=====RR2=====E2
                 <--->         |             | <--->
                               +-------------+
                          
                            Figure 1
                            --------

   RR1, RR2, R3 and R4 are bgp routers in the same AS. E1 and E2 are BGP
   routers in some other AS peering with RR1 and RR2 respectively via 
   EBGP. RR1 is configured as a route reflector with R4 as a client and 
   RR2 is configured as a reflector in a different cluster with R3 as a 
   client. The IBGP sessions are denoted in the diagram above by +---+ 
   and the EBGP sessions by <--->.  For simplicity, assume that all the 
   physical links (denoted by ===) have the same IGP cost.

   Now if both E1 and E2 advertise the same prefix to RR1 and RR2 
   respectively, all other things being equal, RR1 picks the route 
   through E1 for this prefix on account of lower IGP cost. RR1 then 
   reflects this route to R4 which now routes to the prefix in question 
   through R3 and RR1 Similarly RR2 picks the route through E2 and 
   reflects it to R3 which now routes to the prefix in question through 
   R4 and RR2. Clearly a data packet for this prefix will loop between 
   R3 and R4.

   Note that the problem would disappear if the topology is reverted to
   full-mesh IBGP - R3 would pick the route through RR1 and R4 would 
   pick the route through RR2, both on account of lower IGP cost.


5. Incorrect Routing Decision

   Consider the topology in Figure 2.

                    [RR1]------------------[RR2]
                     /\                      |
                    /  \                     |
                   /    \                    |
                 [R1]   [R2]                [R3]
                  |      |                   |
                  |      |                   |
                  |      |                   |
                 [E1]   [F1]                [E2]

                            Figure 2
                            --------


Dube, Scudder                                                 [Page 2]

Internet Draft                                             November 1998


   RR1, RR2, R1, R2, R3 are bgp routers in the same AS R. RR1 is a route
   reflector with clients R1 and R2 and RR2 is a route reflector in a 
   different cluster with client R3. E1 and E2 are bgp routers in AS E 
   and EBGP peer with R1 and R3 respectively. F1 is a bgp router in AS F
   which EBGP peers with R2. Assume that E1, E2 and F advertise the same
   prefix to R1, R2, R3 in accordance with the following table -

   Router    AS    Router-id    MED
   --------------------------------
   E1        E     3.3.3.3      50
   F1        F     2.2.2.2      -
   E2        E     1.1.1.1      100

   All other attributes of the prefix in question are the same.

   Further assume that RR1's IGP cost to R1 (and E1) is the same as its 
   cost R2 (and F1) and RR2's IGP cost to R3 (and E2) is the same as 
   its IGP cost to R1 (and E1) and R2 (F1). (The --- lines in Figure 2
   denote both physical and BGP connectivity).

   Now, RR1 chooses the route thru F1 on account of lower router-id as 
   compared to the route through E1 (which wins over the route from E2 
   on account of MEDs). RR2 on the other hand chooses the route through 
   E2 on account of lower router-id as compared to F. Note that RR1 
   sends only the route through F1 to RR2 and not the route through E1.

   Instead if we had a full-mesh, RR2 would see all the 3 routes and 
   pick the one thru F1 - the route through E1 wins over the route 
   through E2 on MEDs and the route through F1 wins over the route 
   through E1 on account of lower router-id.

   A network operator shifting from a topology without to reflectors to 
   the one above with reflectors would have a problem. Packets destined 
   for the prefix in question would flow from RR2 through E2 instead of 
   the original F1.


6. Characterization
 
   Problem 1 (Section 4) has two ingredients - a) the selective nature 
   of route reflectors which prevents some routes from getting to some 
   clients and b) The fact the some of the BGP decision process -- 
   specifically the "prefer lowest IGP cost" rule -- depend on the 
   router's location in the network.  Thus the route reflector's 
   decision can never perfectly mirror the decision its client would 
   have made.  Note that b) implies that reflector topologies can be 
   out of sync with the physical topologies but bad things happen only 
   when they get out of sync enough that clients would make decisions 
   (in this case based on IGP cost) different from their servers if 
   reflection was replaced by full-mesh.




Dube, Scudder                                                 [Page 3]

Internet Draft                                             November 1998


   Problem 2 (Section 5) has two components too - a) the selective 
   nature of route reflectors as above and b) the partial order that 
   MEDs impose upon competing routes (this is because MEDs can be 
   compared only between routes from the same AS). If all decision 
   criteria used by BGP imposed a total order on the routes (i.e all BGP
   routes for a prefix could be arranged in strict order of precedence),
   then b) would not be an issue and in-spite of a) this problem would 
   not happen.

   For both examples discussed, it is possible to come up with several 
   other topologies which suffer from the problems described above.


7. Avoidance Guidelines
   
   Since there are no protocol mechanisms currently available to detect 
   the problems mentioned above, we provide guidelines to avoid 
   situations where these problems could surface.
 
   As noted in section 6, problem 1 happens because the IBGP reflector 
   topology doesn't follow the physical topology. A simple way of 
   avoiding this problem would be to ensure that reflector clusters are 
   constrained to follow the physical connectivity between the routers.
   It is always safe (at least with respect to this problem) to deploy 
   route reflection such that no IBGP session between a pair of route 
   reflectors will ever physically transit a reflector client. One 
   common mode of deployment is to fully mesh all the routers in a 
   "backbone" region, and to do route reflection to/from/between the 
   routers in a POP, using one or more of the backbone routers as the 
   reflector(s).
 
   Problem 2 can be avoided by always making sure that reflectors are 
   never forced to decide on the best BGP route based on MEDs. This can 
   be achieved either by setting the local preference of a route at the 
   border router to reflect the MED values or by configuring community 
   based policies using which the reflector can decide on the best 
   route.


8. Acknowledgments

   The First author would like to thank to Harry Mantakos, James Da 
   Silva and Arvind Srivaths (all at Torrent Networking Technologies 
   Corp.), Rob Coltun (Fore Systems) and Tony Przgyienda (Bell Labs, 
   Lucent Technologies) for discussions on this topic. The second 
   author would like to thank Ravi Chandra and Tony Bates (both at 
   Cisco Systems) for similar discussions.







Dube, Scudder                                                 [Page 4]

Internet Draft                                             November 1998


9. References

   [1] Rekhter, Y., and  Li, T., "A Border Gateway Protocol 4 (BGP-4)", 
       RFC 1771, March 1995.

   [2] Bates, T., and Chandra, R., "BGP Route Reflection An 
       alternative to full mesh IBGP", RFC 1966, June 1996.


10.Author Information

   Rohit Dube
   Bell Labs, Lucent Technologies Inc.
   4C-508, 101 Crawfords Corner Road
   Holmdel, NJ 07724
   e-mail: rohitd@dnrc.bell-labs.com

   John G. Scudder
   Internet Engineering Group, LLC
   122 S. Main, Suite 280
   Ann Arbor, MI 48104
   e-mail: jgs@ieng.com
































Dube, Scudder                                                 [Page 5]