Internet draft                                               T. Hardie
Expires: April, 2002				              R. White
			        			November, 2001   
Category: Work-in-progress                             
                                                  
draft-hardie-bounded-longest-match-02.txt


                        Bounding Longest Match Considered

          
        
Status of this memo

  This document is an Internet-Draft and is in full conformance with
  all provisions of Section 10 of RFC 2026.

  Internet-Drafts are working documents of the Internet Engineering
  Task Force (IETF), its areas, and its working groups.  Note that
  other groups may also distribute working documents as Internet-
  Drafts.

  Internet-Drafts are draft documents valid for a maximum of six
  months and may be updated, replaced, or obsoleted by other
  documents at any time.  It is inappropriate to use Internet-Drafts
  as reference material or to cite them other than as "work in
  progress."

  The list of current Internet-Drafts can be accessed at
  http://www.ietf.org/ietf/1id-abstracts.txt

  To view the list Internet-Draft Shadow Directories, see
  http://www.ietf.org/shadow.html.

Copyright

  Copyright (C) The Internet Society 2001.  All Rights Reserved.

Abstract

  Some ASes currently use length-based filters to manage the size of
  the routing table they use and propagate.  This draft explores an
  alternative to length-based filters which allows for more automatic
  configuration and which provides for better redundancy.

  Rather than use a filter, this draft proposes a method of modifying
  the BGP longest match algorithm by setting a bound on the prefix
  lengths eligible for preference.  A bound would operate on long
  prefixes when covering route announcements are available; in certain
  circumstances it would cause a router to prefer an aggregate over a
  more specific route announcement.


1. What problem would modifying longest match address?

   Modifying longest match would limit the rate of growth in the
   routing table seen by many BGP speakers.  The current rate of
   growth and the time to convergence represent threats to the
   stability to the Internet.  In the short term, the IETF is
   considering efforts to curb these threats while new routing
   paradigms that attack the fundamental limitations of path vector
   protocols are developed and deployed.
     
   A number of the practical efforts to limit the rate of growth of
   the routing table have focused on filter policies, arguing that
   aggressive filtering will return the Internet to a state in which
   provider aggregates are a majority of the routes in the routing
   table[3].  This draft proposes an approach along those same lines,
   but using a bound on the longest match algorithm rather than a
   filter policy.  The authors believe that this approach can produce
   a similar (though not identical) effect while retaining full
   reachability and allowing multi-homing non-transit networks to
   achieve the main goals which have motivated their becoming
   independent ASes.
   

2. How might this work?


   As each prefix is received by a BGP speaker from an external
   peer, it would be evaluated in the light of other prefies already
   received. If two prefixes overlap in space (such as 192.168.0.0/16
   and 192.168.1.0/24), the longer prefix would be marked (for example
   with a new, as yet unspecified PATH_TYPE) in order to indicate it
   should not be used for forwarding traffic, and propogated to its 
   internal peers.

   Any peer receiving a prefix with the "bounding" marker would
   then compare this path to other paths towards the same prefix and
   prefix length. 

   If the router is not advertising a bounded path for this prefix
   as well, it will simply filter (or block) advertising of this
   bounded prefix from its Local-RIB to other BGP speakers within
   the autonomous system (internal speakers).

   If the router is advertising a bounded path for this prefix as
   well, it will compare the router ID of the advertising router to
   its router ID; if the loacl router has the lower router ID, it
   will continue to advertise the prefix marked as bounded.
   Otherwise, it will stop advertising this prefix altogether.

   Rather than using the standard BGP prefix comparison (decision
   process), the path originated by the BGP speaker with the lowest
   router ID will be preferred, so only one BGP speaker will
   advertise a bounded path into an autonomous system.

   Since the bounded path is not to be inserted into the Local-RIB,
   it will not be propogated to BGP speakers outside the autonoumous
   system.

2.1 Examples of Bounding the Longer Prefix

2.1.1  Assume the following configuration of autonomous systems:

                    (   )
          /-------( AS2 )--------\
   (   ) /         (   )          \ (   )       (   )
  ( AS1 )                          ( AS4 )-----( AS5 )
   (   ) \         (   )          / (   )       (   )
          \-------( AS3 )--------/
                 (   )

   AS1 is advertising 192.168.1.0/24 to both AS2 and AS3.
   AS2 is advertising both 192.168.1.0/24 and 192.168.0.0/16 into
   AS4
   AS3 is advertising 192.168.1.0/24 into AS3
   Each connection (session) is handled by a seperate router
   within each AS (AS4 peers with AS2 on a seperate router than AS3
   does)

   When the peering router in AS4 between AS4 and AS2 receives both
   the 192.168.1.0/24 and the 192.168.0.0/16 prefixes, it will mark
   the 192.168.1.0/24 as bounded and will then propogate this 
   through AS4. When the router in AS4 between AS4 and AS3
   receives this bounded advertisement, it will stop advertising
   192.168.1.0/24 into AS4, leaving only the shorter prefix
   (the aggregate) for forwarding purposes within the AS.


   If the link between AS1 and AS2 fails, the longer length prefix
   will be withdrawn from AS2, and thus the peering point between
   AS2 and AS4 will no longer have an overlapping set of
   prefixes. Thus, within AS4, the border router which peers with
   AS2 will cease advertising the 192.168.1.0/24 prefix which is
   marked as bounded. When the border router peering with AS3
   receives this withdraw, it will note it no longer has any
   restriction on advertising this longer prefix, and will begin
   propagating it.

2.1.2  Assume this configuration of autonomous systems:

                   (   )
          /-------( AS2 )--------\
   (   ) /         (   )          \ (   )       (   )
  ( AS1 )                          ( AS4 )-----( AS5 )
   (   ) \   (   ) /--( AS6 )-----/ (   )       (   )
          \-( AS3 )              /
             (   ) \--( AS7 )---/

   And we have the following assumptions:

   AS1 is advertising 192.168.1.0/24 to both AS2 and AS3.
   AS2 is advertising both 192.168.1.0/24 and 192.168.0.0/16 into
   AS4
   AS3 is advertising 192.168.1.0/24 into AS6 and AS7
   AS6 and AS7 are advertising 192.168.1.0/24 into AS4
   Each connection (session) is handled by a seperate router
   within each AS (AS4 peers with AS2 on a seperate router than AS3
   does)

   The operation is much the same as 2.1.1, except that the border
   routers within AS4 peering with AS6 and AS7 now both block the
   longer prefix from being advertised into AS4. AS5 only receives
   the single aggregate originally advertised by AS2.

2.2  Advantages to the Service Provider

   AS4, in each of the situations, reduces the number of prefixes
   carred through the autonomous system by the number of longer
   prefixes that overlap with aggregates of those prefixes. While
   one copy of the prefix continues to be carried through the
   autonomous system, this entry is not placed in the forwarding
   table, nor is it propogated outside the autonomous system.

   AS5 receives one prefix instead of two (or possibly more).

2.3 Advantages to the Customer

   In this case, the customer is respresented as AS1. The customer
   will continue to receive some amount of traffic over both peering
   sessions, and dual homing through two Service Providers is still
   effective. If the customer's primary link fails, the alternate
   link through AS3 will take over receving all inbound traffic
   automatically. With most other schemes presented to this point,
   the customer loses all impact of dual-homing into the Internet,
   unless both connections are through one Service Provider.

2.4 Advantages to the Internet

   Beyond the second AS hop, aggregation is preserved in all
   cases. While this would not reduce the backbone routing table by
   the dramatic amounts that other methods might, the advantages to
   the community are great, and at greatly reduced risk to
   customers.

    
2.5  What are the costs of using bounds of this type?


2.5.1 Router processing

  This proposal clearly adds to the work which needs to be done during
  overall BGP processing.  Because a check needs to be done for both
  covered and covering routes, some part of this work is required for
  routes of lengths on either side of the bound.  Should this become
  common, however, the rate of growth in the number of routes should be
  smaller and a balance should be struck between the extra processing
  per route and the number of routes.


2.5.2 Traffic engineering

   The implementation of a bound risks magnifying or removing the
   effect of certain widely deployed traffic engineering methods.  If,
   for example, an AS chose to prepend its own route to an
   announcement in order to alter the preference for that route, a BGP
   neighbor using a bounded longest match might now see that route as
   eligible for discard in favor of an aggregate.  While it is fairly
   easy to code around that particular problem, to avoid this class of
   problems it might be preferable to allow this to apply to specific
   AS Sets as well as to all BGP neighbors. 


2.5.3 Propagation delay and increased convergence time.

   If the route to the AS providing the route to the aggregate should
   be lost, the more-specific must propagate into the ASes which had
   formerly heard only the aggregate.  This increases convergence time
   and may create situations in which reachability is temporarily
   compromised.  Unlike the filter case, however, normal BGP behavior
   should restore reachability without changes to the router
   configuration.  There is a also a risk that during a pathological
   event the increased processing required by this change will degrade
   propagation times during those events.  This depends on both the
   speed of specific implementations and the character of the
   topology.
   
2.5.4 Routing Loops

   In cases where neighbor ASes do not apply the same bounds,
   short-lived routing loops could occur.  As an example, imagine an
   AS which has applies a bound at /23 and has two routers, rtrA and
   rtrB, each with an eBGP peer in a different AS.  Both routers learn
   a /24.  rtrA's is the best path; and rtrB propagates that path to
   its eBGP peer.  If rtrB learns a covering /16 from its eBGP peer
   the bound will cause that /16 to be the best path, so it will
   point traffic to the /24 to its peer.  If that peer is not using
   a similar bound, however, it will point traffic to the /24 
   through rtrB to the route learned at rtrA.  This is transient,
   but it does cause a loop.


2.6 What are the advantages of using bounds of this type?

   As noted above, using a bound on the longest match algorithm limits
   the number of routes used and propagated by an AS without risking a
   loss in reachability.  Many of the same effects can be achieved
   with active management of length-based filters, but this method
   should allow a single knob to achieve the same thing as many lines
   of router configuration.  It also avoids the risk of long lived
   black holes inherent in the application of length-based filters
   where no covering aggregate is announced or when that aggregate is
   withdrawn by a loss in reachability.


2.7 Who gets the benefits?

   The AS which ignores the route longer than the bound gets the
   benefit for all routers past the router which undertakes the
   processing to apply the bound.  The ASes which are BGP neighbors of
   that AS derive the same benefit, however, without incurring the
   extra processing.


2.8 What does all this mean for multi-homing?

   Multi-homed non-transit ASes may maintain multiple BGP
   relationships to ensure redundancy in the presence of provider
   network or business failure, to manage performance to key traffic
   exchange partners, and to reduce expenditures on transit through
   peering.

   The deployment of bounded longest match as described here should
   not affect redundancy.  In the case of failure of the AS from whom
   the address space is initially derived, the propagation delay of
   the longer match past the next-hop BGP neighbor may be subject to
   the problems discussed in 2.5.3.
 
   Performance to key traffic exchange partners who are direct BGP
   neighbors is not affected at all by the deployment of the described
   algorithm for bounded longest match.  For those who are not direct
   BGP neighbors, the ability to manage the outbound path is not
   changed, but the return path may be changed.  This is, in any case,
   notoriously difficult to control.

   The deployment of bounded longest match should not affect the
   traffic exchange between peers, though the ability of such a system
   to recognize AS Sets may be required where peers are defined by
   an AS Sets rather than an single AS.


3. Security Considerations

   This document presumes that the implementation of bounded longest
   match is a knob inside a router config.  Since the use of the knob
   affects route announcements not originating within the router's AS
   or its direct neighbors, the new behavior may result in surprises
   to the announcing AS.  It is possible that this behavior might be
   considered a denial of service or mistaken for a denial of service
   by systems designed to detect black-holing on behalf of the origin
   AS.
   

4. Full copyright statement

  Copyright (C) The Internet Society 1999.  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain
  it or assist in its implementation may be prepared, copied,
  published and distributed, in whole or in part, without restriction
  of any kind, provided that the above copyright notice and this
  paragraph are included on all such copies and derivative works.
  However, this document itself may not be modified in any way, such
  as by removing the copyright notice or references to the Internet
  Society or other Internet organizations, except as needed for the
  purpose of developing Internet standards in which case the
  procedures for copyrights defined in the Internet Standards process
  must be followed, or as required to translate it into languages
  other than English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on
  an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
  ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
  IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
  THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

5. Acknowledgements

   Cengiz Alaentinoglu, Alvaro Retana, Russ White, and Barry Greene
   gave valuable comments on the -00.txt version of this draft. A
   number of colleagues also gave the author valuable comments on the
   white board markings that gave rise to this paper; among them are
   Lane Patterson, Ian Cooper, Gerd Besch, Bill Norton, Diarmuid
   Flynn, and Sean Donelan.
   
6. References

[1] Huston, Geoff.  http://www.telstra.net/ops/bgp/index.html

[2] Ahuja, Abha.
    http://www.merit.edu/~ahuja/ptomaine-bof/ahuja-ietf-ptomaine/index.htm

[3] Bush, Randy.  Plenary, IETF 51.  Eventually at:
    http://www.ietf.org/proceedings/01aug/



7. Authors' addresses

   Ted Hardie                                      
   Ted.Hardie@nominum.com


   Russ White
   ruwhite@cisco.com