Internet DRAFT - draft-bashandy-idr-bgp-repair-label

draft-bashandy-idr-bgp-repair-label



Network Working Group                                       A. Bashandy
Internet Draft                                             B. Pithawala
Intended status: Standards Track                          Cisco Systems
Expires: October 2012                                       Jakob Heitz
                                                               Ericsson
                                                            May 1, 2012

              Scalable, Loop-Free BGP FRR using Repair Label
                draft-bashandy-idr-bgp-repair-label-04.txt


Abstract

Consider a BGP free core scenario. Suppose the provider edge BGP
speakers PE1, PE2,..., PEn know about a prefix P/m via the external
routers CE1, CE2,..., CEm.  If the PE router PEi loses connectivity to
the primary path, it is desirable to immediately restore traffic by
rerouting packets arriving from the core to PEi and destined to the
prefix P/m to one of the other PE routers that advertised P/m, say PEj,
until BGP re-converges. However if the loss of connectivity of PEi to
the primary path also resulted in the loss of connectivity between PEj
and CEj, rerouting a packet before the control plane converges may
result in a loop. In this document, we propose using a repair label for
traffic restoration while avoiding loops. We propose advertising the
"repair" label through BGP.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008. The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.




Bashandy                Expires April 30, 2012                 [Page 1]

Internet-Draft        BGP FRR using Repair Label           October 2011

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on April 1, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.



Table of Contents

   1. Introduction...................................................3
      1.1. Conventions used in this document.........................4
      1.2. Terminology...............................................4
   2. Protocol Operation.............................................5
      2.1. Control plane Operation...................................5
         2.1.1. Additional Rules for allocating and advertising a Repair
         label.......................................................6
      2.2. Forwarding Plane Operation................................6
      2.3. Example...................................................7
   3. How to Disseminate Repair Label Information....................9
         3.1.1. Structure of the Repair Label Path Attribute........10
         3.1.2. Semantics of the Repair Label Attribute.............10
         3.1.3. Additional Rule when Forwarding Advertisements
         Containing the Repair Path Attribute.......................11
   4. Security Considerations.......................................12
   5. IANA Considerations...........................................12
   6. Conclusions...................................................12

Bashandy                Expires April 1, 2012                  [Page 2]

Internet-Draft        BGP FRR using Repair Label           October 2011

   7. References....................................................12
      7.1. Normative References.....................................12
      7.2. Informative References...................................13
   8. Acknowledgments...............................................13

1. Introduction

   In a BGP free core, where traffic is tunneled between edge routers
   and edge routers assign labels to prefixes, BGP speakers advertise
   reachability information about prefixes and associate a local label
   with each prefix such as L3VPN [9], 6PE [10], and Softwire [8].
   Suppose that a given edge router is chosen as the best next-hop for
   a prefix P/m. An ingress router that receives a packet from an
   external router and destined for the prefix P/m pushes the label
   advertised by the egress edge router and then "tunnels" the packet
   across the core to that egress router. Upon receiving the labeled
   packet from the core, the egress router uses the label on the packet
   to take the appropriate forwarding decision.

   In modern networks, it is not uncommon to have a prefix reachable
   via multiple edge routers. One example is the best external path
   [7]. Another more common and widely deployed scenario is L3VPN [9]
   with multi-homed VPN sites. As an example, consider the L3VPN
   topology depicted in Figure 1.

         +--------------------------+
         |                          |
         |      BGP free Core       |
         |                          |
         |      +------------------PE1----+
         |     /                    |      \
         |    /                     |       \
         |   /                      |        \
         |  /                       |         \
         | /                        |          *
        PE3                         |          CE....... VPN prefix
         | \                        |          *          (P/m)
         |  \                       |         /
         |   \                      |        /
         |    \                     |       /
         |     \                    |      /
         |      +------------------PE2----+
         |                          |
         |                          |
         +--------------------------+

              Figure 1 VPN prefix reachable via multiple PEs



Bashandy                Expires April 1, 2012                  [Page 3]

Internet-Draft        BGP FRR using Repair Label           October 2011

   PE3 is the ingress PE. PE1 and PE2 are both egress PEs connected to
   CE. CE advertises one or more VPN prefixes, denoted by P/m. PE1 and
   PE2 advertise P/m as VPNv4 or VPNv6 routes to all ingress PEs,
   including PE3, and associates a label with each route.

   Suppose that the ingress PE, PE3, chooses PE1 as the next-hop for
   the prefix P/m. In order to minimize traffic loss, it is highly
   desirable for PE1 to reroute all traffic destined to P/m to PE2 as
   soon as the connectivity to CE is lost without waiting for the
   control plane (whether it is IGP or BGP) to re-converge and compute
   the new best path. In doing so, PE1 pushes the label advertised by
   PE2 for the prefix P/m, and then "tunnels" the packet to PE2.
   However if the loss of PE1-CE connectivity was due to CE crash, then
   PE2 will also reroute the traffic back to PE1, resulting in a loop.
   Due to ultra scalability requirements, where there is a need to
   support thousands of peers and hundreds of thousands of prefixes,
   there is a need to support quick traffic restoration without waiting
   for the control plane to converge and without risking loops.

   1.1. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 [1].

   In this document, these words will appear with that interpretation
   only when in ALL CAPS. Lower case uses of these words are not to be
   interpreted as carrying RFC-2119 significance.

   1.2. Terminology

   This section outlines the terms used in this document. For ease of
   use, we will use terms similar to those used by L3VPN [9]

  o  Protected prefix: a prefix P/m (of any AFI) that a BGP speaker
     has an external path to. The BGP speaker may learn about the
     prefix from an external peer through BGP, some other protocol, or
     manual configuration. The protected prefix is advertised to some
     or all the internal peers.

  o  Primary egress PE: an IBGP peer that can reach the protected
     prefix P/m through an external path and advertised the prefix to
     the other IBGP peers. The primary egress PE was chosen as the
     best path by one or more internal peers. In other words, the
     primary egress PE is an egress PE that will normally be used when
     there is no failure. Referring to Figure 1, PE1 is a primary
     egress PE.



Bashandy                Expires April 1, 2012                  [Page 4]

Internet-Draft        BGP FRR using Repair Label           October 2011

  o  CE: an external router through which an egress PE can reach a
     prefix P/m. The router "CE" in Figure 1 is an example of such a
     CE

  o  Ingress PE: a BGP speaker that learns about a prefix through
     another IBGP peer and chooses that IBGP peer as the next-hop for
     the prefix. PE3 in Figure 1 is an example of an ingress PE

  o  Repairing PE: the egress PE that attempts to restore traffic when
     the primary path is no longer reachable "without" waiting for BGP
     to re-converge. The repairing PE restores the traffic by
     rerouting the traffic (through a tunnel) towards the pre-
     calculated repair PE when it detects that the primary path is no
     longer reachable. Referring to Figure 1, if PE3 chooses PE1 as
     the primary egress PE and PE1 decides to reroute traffic to PE2
     on losing reachability with CE, then PE1 is a repairing PE.

  o  Primary label: the label advertised by the primary egress PE to
     be used for normal traffic forwarding.

  o  Repair egress PE: an egress PE other than the primary egress PE
     that can reach the protected prefix P/m through an external
     neighbor. The repair PE is pre-calculated via other repairing PEs
     prior to any failure

  o  Repair label: the label that will be pushed on the packet when
     the repairing PE reroutes the traffic (through a tunnel) towards
     the repair egress PE. Section 2.  discusses how the repair label
     is used. Section 3.  discusses semantics of and the method for
     disseminating repair label information.

  o  Repair path: the repair egress PE and the repair label.

  o  internal and external: internal or external to the core.

2. Protocol Operation

   This section explains the operation of the control and forwarding
   planes of routers participating in BGP-free core traffic
   restoration.

   2.1. Control plane Operation

   1. As usual, each PE allocates a local label for each prefix it can
      reach through an external neighbor CE. This is the primary label
      used for normal traffic forwarding.




Bashandy                Expires April 1, 2012                  [Page 5]

Internet-Draft        BGP FRR using Repair Label           October 2011

   2. To provide repair path information to all PEs, the PE also
      allocates a repair label to the prefix if it can reach that
      prefix via an external neighbor. Different repair label
      allocation schemes are proposed in Section 3. .

   3. The PE advertises both the primary and repair labels to all IBGP
      peers.

   4. When a PE receives the label advertisement from egress PEs, it
      calculates a primary egress PE and a repair egress PE based on
      its internal path selection criteria. Note that the method of
      choosing the repair path is beyond the scope of this document.

   5. In the end, for some of the prefixes advertised by more than one
      PE, a PE will have

       o a primary path

       o a repair path consisting of a repair PE and a repair label
          advertised by the chosen repair PE.

   6. A PE "never" protects a repair label. Hence on any PE, a repair
      label only has paths towards the CE. However a primary label may
      have a repair path towards a chosen repair PE

2.1.1. Additional Rules for allocating and advertising a Repair label

  o  A repair PE MUST NOT advertise a repair label for a prefix if it
     does NOT have an external path to the prefix

  o  The forwarding entry for the repair label on the repair PE MUST
     NOT point to an internal path

  o  Repair labels SHOULD be advertised with labeled address families
     only. That is AFI/SAFI 1/4, 2/4, 1/128, and 2/128.



   2.2. Forwarding Plane Operation

   This section specifies the forwarding plane operation when a PE
   receives a packet and any of the following two conditions are true:

  o  The PE lost the primary path and has not yet calculated another
     primary path and programmed it in the forwarding plane.





Bashandy                Expires April 1, 2012                  [Page 6]

Internet-Draft        BGP FRR using Repair Label           October 2011

  o  The arriving packet arrived from the core and the PE does not
     have an external path. It is noteworthy to mention that this
     condition should be a temporary condition until all ingress PEs
     converge and stop sending traffic to that PE.



   The forwarding plane processes arriving traffic as follows:

   1. If the repairing PE is an egress PE, the packet arrives at the
      repairing PE with the primary label at the top because the packet
      is "tunneled" from the ingress PE(s). In that case, the repairing
      PE swaps the incoming label stack with the "repair label stack"
      advertised by the repair egress PE. Section 3.1.2.  specifies all
      the details

   2. The repairing PE tunnels the packet to the repair PE

   3. At the repair PE, the packet arrives with the repair label at the
      top. The repair PE uses the incoming label stack to take
      forwarding decisions

   4. If the repair egress PE can reach the CE, the repair PE forwards
      the packet towards the CE.

   5. If the repair PE cannot reach the CE, the traffic will be dropped
      because a PE never protects a repair label

   2.3. Example

   Consider the L3VPN [9] topology depicted in Figure 2 where two PEs
   are connected to the same PE. Assume that the core is LDP. We will
   be using an advertised repair label.

                                  PE1
                                     \
                                      \
                                       \
                                        \
               LDP core                 CE....... VPN prefix
                                        /        (10.0.0.0/8)
                                       /
                                      /
                                     /
                                   PE2

                         Figure 2 : L3VPN Example



Bashandy                Expires April 1, 2012                  [Page 7]

Internet-Draft        BGP FRR using Repair Label           October 2011

   PE1: Repairing egress PE
   PE2: repair PE
   Primary VPN label advertised by PE1 to all PEs: 4000
   Repair VPN label advertised by PE1 to all PEs: 5000
   Primary VPN label advertised by PE2 to all PEs: 2000
   Repair VPN label advertised by PE2 all PEs: 3000

   LDP label for PE2 on PE1 is 1234
   LDP label for PE1 on PE2 is 4567

   Before failure
   '''''''''''''''
   PE1 has the following FIB entries

   4000 -----> CE (unlabeled)
        -----> PE2, swap 4000 with 3000 and then push 1234
   5000 -----> CE (unlabeled)

   PE2 has the following
   2000 -----> CE (unlabeled)
        -----> PE1, swap 2000 with 5000 and then push 4567
   3000 ------> CE (unlabeled)

   After the CE crashes
   ''''''''''''''''''''
   PE1 has the following entry:
   4000 -----> PE2, swap 4000 with 3000 and then push 1234
   5000 -----> Drop


   PE2 has the following
   2000 -----> PE1, swap 2000 with 5000 and then push 4567
   3000 ------> Drop


   Because of the above routing entries, any traffic arriving from the
   core at PE1 and destined for 10.0.0/8,  is rerouted towards PE2
   using the repair VPN label 3000. PE2 will just drop it instead of
   looping it back towards PE1.

   After the link between PE1 and CE fails (CE did not crash)
   '''''''''''''''''''''''''''''''''''''''''''''''''''''''''
   PE1 has the following entry:
   4000 -----> PE2, swap 4000 with 3000 and then push 1234
   5000 -----> Drop

   PE2 has the following
   2000 -----> CE (unlabeled)
        -----> PE1, swap 2000 with 5000 and then push 4567

Bashandy                Expires April 1, 2012                  [Page 8]

Internet-Draft        BGP FRR using Repair Label           October 2011

   3000 ------> CE


   Because of the above routing entries, any traffic arriving from the
   core at PE1 and destined for 10.0.0/8 is rerouted towards PE2 using
   the repair VPN label 3000. PE2 will forward the traffic towards CE.

3. How to Disseminate Repair Label Information

   We propose to advertise the repair label as an optional path
   attribute. Advertising the repair label as an optional path
   attributes has some advantages:

  o  An egress PE can benefit from a scalable repair label allocation
     schemes such as per-CE repair label allocation

  o  Allows the repairing PE to share the same repair path among
     multiple protected prefixes. Since the repair path is shared by
     all labels sharing the path attribute, the repairing PE can
     optimize its RIB and FIB by sharing the same repair path data
     structure among a large number of protected prefixes.

  o  Reduces the BGP update message size. Instead of having to send
     additional labels per prefix, multiple prefixes can share the
     same repair label

  o  The number of labels used for traffic restoration does not depend
     on the number of protected prefixes

  o  Allows for incremental deployment because the attribute is
     optional

   The main disadvantage of sharing the same repair path among multiple
   primary paths is loss of fine grain control. It is not possible to
   manage, control, or provide differentiated handling to traffic on
   per prefix basis until the network re-converges. The loss of fine
   grain control is limited to the BGP re-convergence period.

   It is noteworthy to mention that per-CE and/or per next-hop repair
   label allocation has some advantages over per-prefix repair label
   allocation. First it results in using fewer labels. Second it allows
   for better packing in BGP messages. Third it does not require
   special handling in the forwarding plane at the repair PE. Fourth it
   simplifies the forwarding plane while maximizing the packet
   switching performance because the egress PE can take a forwarding
   decision with a single FIB lookup.




Bashandy                Expires April 1, 2012                  [Page 9]

Internet-Draft        BGP FRR using Repair Label           October 2011

3.1.1. Structure of the Repair Label Path Attribute

   This document defines the repair label attribute as an optional non-
   transitive path attribute [2] as follows:

      Attribute name: REPAIR_LABEL

      Type code: TBD

      Attribute Flags:

         Optional bit: 1

         Transitive bit: 0

         Partial bit: 0

         Extended Length bit: 0

      Length of the attribute: length in octets of the attribute

      Attribute Value: The attribute value contains a stack of one or
      more labels. The encoding of the labels is identical to encoding
      of the "label" field in [4]. The value of the bottom of stack
      (BOS) bit is determined at traffic restoration time as specified
      in Section 3.1.2. .

3.1.2. Semantics of the Repair Label Attribute

   This document specifies the semantics of the repair label attribute
   when the attribute carries one repair label only. The semantics of
   more than one repair label is beyond the scope of this document.

   Suppose a BGP speaker PE1 receives an update message with a repair
   label attribute containing the label "Lr2" from the IBGP peer PE2.
   Suppose the NLRI in the MP_REACH_NLRI attribute [3] contains the
   prefixes R1, R2,. . . , Rn each bound to a label L21, L22,. . . ,
   L2n, respectively. This means the following:

   1. PE2 will never attempt to repair a packet arriving with the label
      "Lr2". Hence PE2 will either forward the packet to an external CE
      or drop the packet

   2. PE2 expects the following from PE1:






Bashandy                Expires April 1, 2012                 [Page 10]

Internet-Draft        BGP FRR using Repair Label           October 2011

       a. Case a: The route Ri on PE1 is bound to a local label "L1i".
          Suppose PE1 receives a packet from the core with the label
          "L1i" at the top of the stack. If the PE1 loses the primary
          path for a prefix Ri or PE1 receives a packet from the core
          while not having an external path, and PE1 decides that PE2
          is the repair PE for the prefix Ri, then PE1 MUST swap the
          label "L1i" on the packet with the repair label "Lr2" and
          then tunnel the packet to PE2. The bottom of stack (BOS) bit
          MUST be copied from the label arriving on the packet to the
          label "Lr2"

       b. Case b: The route Ri on PE1 is bound to an aggregate label
          (e.g. per-vrf label). In that case, if PE1 receives a packet
          from the core, PE1 has to perform more than one route lookup
          to determine the primary path. Eventually, there will either
          be an IP lookup or a label lookup that points to the primary
          path:

           i. A label lookup points to the primary path: In that case,
               PE1 handles the packet as described in item 2.a above.

          ii. An IP lookup points to the primary path: In that case,
               if the PE1 loses the primary path for a prefix Ri or PE1
               receives a packet from the core while not having an
               external path, PE1 handles the packet as follows

                 1. PE1 pops all labels on the packet

                 2. PE1 MUST push the label "Lr2"

                 3. PE1 tunnels the packet to PE2. The bottom of stack
                    (BOS) bit in "Lr2" MUST be set as specified in [5].

3.1.3. Additional Rule when Forwarding Advertisements Containing the
   Repair Path Attribute

   As specified in Section 3.1.1. , the repair label attribute is a
   non-transitive attribute. However there may be cases, such as inter-
   AS option (b)[9], route reflectors [11], or confederation [12],
   where a router may replace the advertised next-hop with its own
   before forwarding an advertisement. If a BGP speaker replaces the
   next-hop attribute with its own and the advertisement contains a
   repair label attribute with label stack "Sr", there are two options

  o  Option 1: The BGP speaker MUST NOT advertise the repair label
     attribute




Bashandy                Expires April 1, 2012                 [Page 11]

Internet-Draft        BGP FRR using Repair Label           October 2011

  o  Option 2: The BGP speaker MUST replace the repair label stack
     "Sr" with a locally allocated label stack "Sr1" before
     advertising the route and then advertise the stack "Sr1" in the
     repair label attribute. For the forwarding plane, the BGP speaker
     MUST install a swap forwarding entry such that if the BGP speaker
     receives a packet with the label stack "Sr1", it swaps "Sr1" with
     the stack "Sr".

   Note that advertising the repair label attribute by the router
   depends on whether the router understands the semantics of and
   supports the repair label attribute at the time of receiving an
   advertisement containing the repair label attribute.

4. Security Considerations

   No additional security risk is introduced by using the mechanisms
   proposed in this document

5. IANA Considerations

   This document defines a new BGP path attribute. IANA maintains a
   list of the current BGP attribute typecodes in [6]. This document
   proposes defining a new typecode value of "TBD" for the REPAIR_LABEL
   path attribute

6. Conclusions

   This document proposes using a repair label to allow restoring
   traffic prior to BGP convergence while avoiding loops

7. References

   7.1. Normative References

   [1]   Bradner, S., "Key words for use in RFCs to Indicate
         Requirement Levels", BCP 14, RFC 2119, March 1997.

   [2]   Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol
         4 (BGP-4), RFC 4271, January 2006

   [3]   Bates, T., Chandra, R., Katz, D., and Rekhter Y.,
         "Multiprotocol Extensions for BGP", RFC 4760, January 2007

   [4]   Rosen, E., Rekhter, Y., "Carrying Label Information in BGP-4",
         RFC 3107, May 2001

   [5]   Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci,
         D., Li, T. and A. Conta, "MPLS Label Stack Encoding", RFC
         3032, January 2001.

Bashandy                Expires April 1, 2012                 [Page 12]

Internet-Draft        BGP FRR using Repair Label           October 2011

   7.2. Informative References

   [6]   BGP Parameters, http://www.iana.org/assignments/bgp-
         parameters/bgp-parameters.xhtml

   [7]   Marques,P., Fernando, R., Chen, E, Mohapatra, P.,
         "Advertisement of the best external route in BGP", draft-ietf-
         idr-best-external-02.txt, April 2004.

   [8]   Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh
         Framework", RFC 5565, June 2009.

   [9]   Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
         Networks (VPNs)", RFC 4364, February 2006.

   [10]  De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F.,
         Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider
         Edge Routers (6PE)", RFC 4798, February 2007

   [11]  Bates, T., Chen, E., and Chandra, R., "BGP Route Reflection:
         An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456,
         April 2006

   [12]  Traina, P., McPherson, P., and Scudder, J., "Autonomous System
         Confederations for BGP", RFC 5065, August 2007

8. Acknowledgments

   Special thanks to Keyur Patel, Robert Raszuk, and Eric Rosen for the
   valuable comments

   This document was prepared using 2-Word-v2.0.template.dot.

Authors' Addresses

   Ahmed Bashandy
   Cisco Systems
   170 West Tasman Dr, San Jose, CA 95134
   Email: bashandy@cisco.com

   Burjiz Pithawala
   Cisco Systems
   170 West Tasman Dr, San Jose, CA 95134
   Email: bpithaw@cisco.com

   Jakob Heitz
   Ericsson
   100 Headquarters Drive, San Jose, CA, 95134
   Email: jakob.heitz@ericsson.com

Bashandy                Expires April 1, 2012                 [Page 13]