Routing Area Working Group                                  S. Litkowski
Internet-Draft                                               B. Decraene
Intended status: Standards Track                                  Orange
Expires: April 18, 2013                                      C. Filsfils
                                                                 K. Raza
                                                           Cisco Systems
                                                        October 15, 2012


             Operational management of Loop Free Alternates
               draft-litkowski-rtgwg-lfa-manageability-00

Abstract

   Loop Free Alternates (LFA), as defined in RFC 5286 is an IP Fast
   ReRoute (IP FRR) mechanism enabling traffic protection for IP
   traffic.  Following first deployment experience, this document
   provides operational feedback on LFA, highlights some limitations and
   proposes a set of refinements to address those limitations.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 18, 2013.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must


Litkowski, et al.        Expires April 18, 2013                 [Page 1]

Internet-Draft              LFA manageability               October 2012


   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Operational issues with default LFA tie breakers . . . . . . .  3
     2.1.  Case 1: Edge router protecting core failures . . . . . . .  3
     2.2.  Case 2: Edge router choosen to protect core failures
           while core LFA exists  . . . . . . . . . . . . . . . . . .  5
     2.3.  Case 3: suboptimal core alternate choice . . . . . . . . .  6
   3.  Configuration aspects  . . . . . . . . . . . . . . . . . . . .  6
     3.1.  LFA activation . . . . . . . . . . . . . . . . . . . . . .  7
     3.2.  Policy based LFA selection . . . . . . . . . . . . . . . .  7
       3.2.1.  Mandatory criteria . . . . . . . . . . . . . . . . . .  8
       3.2.2.  Enhanced criteria  . . . . . . . . . . . . . . . . . .  8
   4.  Operational aspects  . . . . . . . . . . . . . . . . . . . . . 12
     4.1.  Controlling LFA computation  . . . . . . . . . . . . . . . 12
     4.2.  Manual triggering of FRR . . . . . . . . . . . . . . . . . 13
     4.3.  Required local information . . . . . . . . . . . . . . . . 13
     4.4.  Coverage followup  . . . . . . . . . . . . . . . . . . . . 13
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 14
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 14
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15


Litkowski, et al.        Expires April 18, 2013                 [Page 2]

Internet-Draft              LFA manageability               October 2012


1.  Introduction

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   This document is being discussed on the rtgwg@ietf.org mailing list.


2.  Operational issues with default LFA tie breakers

   [RFC5286] introduces the notion of tie breakers when selecting the
   LFA among multiple candidate alternate next-hops.  Most
   implementations are using the following algorithm :

   o  Prefer node protection alternate over link protection alternate.

   o  If protection type is equal, choose alternate providing shortest
      path.

   First deployments have revealed that the above algorithm may be
   insufficent to reflect Service Provider preferences and could lead to
   negative side effects.

   The following sections details use cases highlighting the
   limitations.  Per-prefix LFA is assumed.

2.1.  Case 1: Edge router protecting core failures

           R1 --------- R2 ---------- R3 --------- R4
           |      1           100           1       |
           |                                        |
           | 100                                    | 100
           |                                        |
           |      1           100           1       |
           R5 --------- R6 ---------- R7 --------- R8 -- R9 - PE1
           |             |            |             |
           | 5k          | 5k         | 5k          | 5k
           |             |            |             |
           +--- n*PEx ---+            +---- PE2 ----+
                                             |
                                             |
                                            PEy

                                                   Figure 1

   Rx routers are core routers using n*10G links.  PEs are connected
   using links with lower bandwidth.


Litkowski, et al.        Expires April 18, 2013                 [Page 3]

Internet-Draft              LFA manageability               October 2012


   In figure 1, let's consider the traffic from PE1 to PEx.  Nominal
   path is R9-R8-R7-R6-PEx.  Let's consider the failure of link R7-R8.
   For R8, R4 is not an LFA and the only available LFA is PE2.

   When the core link R8-R7 fails, R8 switches all traffic destined to
   all the PEx towards the edge node PE2.  Hence a edge node and edge
   links are used to protect the failure of a core link.  Typically,
   edge links have less capacity than core links hence congestion will
   occur on PE2 links.  Note that altough PE2 was not directly affected
   by the failure, its links become congested and its traffic will
   suffer from the congestion.

   In summary, in case of failure, the impact on customer traffic is:

   o  From PE2 point of view :

      *  without LFA: no impact

      *  with LFA: traffic is partially dropped (but possibly
         prioritized by QoS mechanism).

   o  From R8 point of view:

      *  without LFA: traffic is totally dropped until convergence

      *  with LFA: traffic is partially dropped (but possibly
         prioritized by QoS mechanism).


Litkowski, et al.        Expires April 18, 2013                 [Page 4]

Internet-Draft              LFA manageability               October 2012


2.2.  Case 2: Edge router choosen to protect core failures while core
      LFA exists

           R1 --------- R2 ------------ R3 --------- R4
           |      1           100       |     1     |
           |                            |           |
           | 100                        | 30        | 30
           |                            |           |
           |     1        50        50  |    10     |
           R5 -------- R6 ---- R10 ---- R7 -------- R8 --- R9 - PE1
           |            |         \                 |
           | 5000       | 5000     \ 5000           | 5000
           |            |           \               |
           +--- n*PEx --+            +----- PE2 ----+
                                             |
                                             |
                                            PEy

                             Figure 2

   Rx routers are core routers meshed with n*10G links.  PEs are meshed
   using links with lower bandwidth.

   In figure 2, let's consider the traffic coming from PE1 to PEx.
   Nominal path is R9-R8-R7-R6-PEx.  Let's consider the failure of the
   link R7-R8.  For R8, R4 is a link-protecting LFA and PE2 is a node-
   protecting LFA.  PE2 is chosen as best LFA due to its better
   protection type.  Just like in case 1, this will probably lead to
   congestion on PE2 links.


Litkowski, et al.        Expires April 18, 2013                 [Page 5]

Internet-Draft              LFA manageability               October 2012


2.3.  Case 3: suboptimal core alternate choice

               +--- PE3 --+
              /            \
        1000 /              \ 1000
            /                \
    +----- R1 ---------------- R2 ----+
    |      |       500         |      |
    | 10   |                   |      | 10
    |      |                   |      |
    R5     | 10                | 10   R7
    |      |                   |      |
    | 10   |                   |      | 10
    |      |       500         |      |
    +---- R3 ---------------- R4 -----+
           |                   |
           | 1000              | 1000
           |                   |
           PE1 -------------- PE2
                   10k

               Figure 3

   Rx routers are core routers.  R1-R2 and R3-R4 links are 1G links.
   All others inter Rx links are 10G links.

   In the figure above, let's consider the failure of link R1-R3.  For
   destination PE3, R3 has two possible alternates:

   o  R4 is node-protecting

   o  R5 is link-protecting

   R4 is chosen as best LFA due to better protection type.  However, it
   may not be desirable to use R4 as prefered alternate due to bandwidth
   capacity reason.  Service provider may prefer to use high bandwidth
   link as prefered LFA.  In this example, prefering shortest path over
   protection type may achieve the expected behavior but in cases where
   metric are not reflecting bandwidth, it would not work and some other
   criteria would need to be involved when selecting the best LFA.


3.  Configuration aspects

   Controlling best alternate and LFA activation granularity is a
   requirement for Service Providers.  This section defines
   configuration requirements for LFA.


Litkowski, et al.        Expires April 18, 2013                 [Page 6]

Internet-Draft              LFA manageability               October 2012


3.1.  LFA activation

   Granularity of LFA activation is important to control scaling of
   boxes (programmed alternate nexthop consuming memory in forwarding
   plane) and to control what is protected and not protected.

   An implementation of LFA SHOULD allow activation:

   o  Per address-family : ipv4 unicast, ipv6 unicast, LDP IPv4 unicast,
      LDP IPv6 unicast ...

   o  Per routing context : VRF, virtual/logical router, global routing
      table, ...

   o  Per interface to control protected interfaces

   o  Per protocol instance, topology, area

   o  Per prefixes: prefix protection SHOULD have a better priority
      compared to interface protection.  This means that if a specific
      prefix must be protected due to configuration request, LFA must be
      computed and installed for this prefix even if the primary
      outgoing interface is not configured for protection.

3.2.  Policy based LFA selection

   When multiple alternates exists, LFA selection algorithm is based on
   tie breakers .  Current tie breakers do not provide sufficient
   control on how best alternate is chosen.  This document proposes an
   enhanced tie breaker allowing service providers to manage all
   specific cases:

   1.  An implementation of LFA SHOULD support policy based decision for
       determining best LFA.

   2.  Policy based decision SHOULD be based on multiple criterions,
       where each criteria having a level of preference.

   3.  If defined policy does not permit to determine a unique best LFA,
       the implementation MUST pick only one based on its own decision.

   4.  Policy SHOULD be applied to a protected interface or to a
       specific set of destinations.  In case of application on the
       protected interface, all destinations primarily routed on this
       interface SHOULD use the interface policy.

   5.  An implementation MAY support a behavior providing a non
       disruptive change compared to behavior described in [RFC5286].


Litkowski, et al.        Expires April 18, 2013                 [Page 7]

Internet-Draft              LFA manageability               October 2012


3.2.1.  Mandatory criteria

   An implementation of LFA MUST support following mandatory criteria:

   o  Non candidate link.  A link marked as "non candidate" it will
      never be used as LFA.

   o  A primary nexthop being protected by another primary nexthop of
      the same prefix (ECMP case).

   o  Type of protection provided by the alternate: link protection,
      node protection, downstream.

   o  Shortest path: lowest IGP metric used to reach the destination.

   o  Local SRLG.

3.2.2.  Enhanced criteria

   An implementation of LFA SHOULD support following enhanced criteria:

   o  Linecard disjointness for protected and protecting nexthop: this
      means that primary and alternate cannot be connected on the same
      linecard.

   o  Link coloring.

   o  Existing TE based informations:

      *  Link affinity

      *  Link speed

      *  Link bandwidth (available/residual)

      *  Link loss

      *  Link delay

   o  Router type: core, edge, core/edge...

   o  Alternate type: link or tunnel alternate.  This means that user
      may change preference between link alternate or tunnel alternate
      (tunnel prefered over link, link prefered over tunnel, or
      considered as equal).


Litkowski, et al.        Expires April 18, 2013                 [Page 8]

Internet-Draft              LFA manageability               October 2012


3.2.2.1.  Linecard disjointness

   Linecard disjointness criteria provides another level of SRLG on the
   node (automatic SRLG).  The SRLG beeing the node Line Card.

   Notion of linecard may be different depending on the hardware design.
   If applicable, multiple level of linecard disjointness may be
   proposed.

3.2.2.2.  Link coloring

   Link coloring is a powerful system to control alternates.  The idea
   is very similar to TE Link affinity but with a local significance
   only.  Protecting interfaces are tagged with colors.  Protected
   interface are configured to include some colors with a preference
   level and exclude others

   Example : P1 router is connected to three P routers and two PEs.

                  PE2
                  |   +---- P4
                  |  /
         PE1 ---- P1 --------- P2
                  |      10Gb
              1Gb |
                  |
                  P3

   P1 is configured to protect the P1-P4 link.  We assume that given the
   topology, all neighbors are LFA.  We would like to enforce a policy
   in the network where only a core router may protect against the
   failure of a core link, and where high bandwidth link are prefered.

   In this example, we can use our link coloring system by:

   o  Marking PEs links with color RED

   o  Marking 10Gb CORE link with color BLUE

   o  Marking 1Gb CORE link with color YELLOW

   o  Configured the protected interface P1->P4 with :

      *  Include BLUE, preference 200

      *  Include YELLOW, preference 100


Litkowski, et al.        Expires April 18, 2013                 [Page 9]

Internet-Draft              LFA manageability               October 2012


      *  Exclude RED

   Using this, PE links will never be used to protect against P1-P4 link
   failure and 10Gb link will be be prefered.

   The main advantage of this solution is that it could be reproduced
   easily on other interface and other nodes without specifities.  A
   Service provider has only to define color system (associate color
   with a significance) as it is done for TE affinity or BGP
   communities.

   Implementation of link coloring:

   o  SHOULD support multiple include and exclude colors an a single
      protected interface.

   o  SHOULD provide a level of preference between included colors.

   o  SHOULD support multiple colors configuration on a single
      protecting interface.

3.2.2.3.  TE based information

   It would be useful to be able to reuse already existing information
   provided by traffic engineering extensions ([RFC3630]/[RFC5305] and
   [I-D.previdi-isis-te-metric-extensions]) as tie-breakers for LFA.
   This would allow automatic and optimized decision when choosing best
   LFA while limiting the configuration overhead.  Existing IGP TE-
   extensions are "dedicated" to Traffic Engineering database and any
   change for LFA choice introduced in TE-LSDB may impact already
   existing tunnels.  It would be interesting to make traffic-
   engineering extensions available to other components than MPLS-TE.
   The mechanisms to achieve this is beyond the scope of this document.

   But basically, LFA as a purely local mechanim, only the local
   information is directly interesting for the alternate choice and
   there is no need to propagate it.

3.2.2.4.  Router type

   Rather than tagging interface on each node (using link color) to
   identify neighbor node type, it would be helpful if routers were
   announcing their role/function in the IGP.  Currently no IGP
   extension provides this information.  The mechanics for flooding this
   information is beyond the scope of this document.

   Consider following network:


Litkowski, et al.        Expires April 18, 2013                [Page 10]

Internet-Draft              LFA manageability               October 2012


                  PE3
                  |
                  |
                  PE2
                  |   +---- P4
                  |  /
         PE1 ---- P1 -------- P2
                  |      10Gb
              1Gb |
                  |
                  P3

   In the example above, each node is configured with its role, and the
   role is flooded through the IGP.

   o  PE1,PE3: edge.

   o  PE2: aggregation (edge/core).

   o  P1,P2,P3: core.

   A simple policy could be configured on P1 to choose best alternate
   for P1->P4 based on router function/role as follows :

   o  criteria 1 -> router type: exclude aggregation and edge.

   o  criteria 2 -> bandwidth.

3.2.2.5.  Link vs tunnel alternate

   In addition to LFA, tunnels (IP, LDP or RSVP-TE) to distant routers
   may be used to complement LFA coverage (tunnel tail used as virtual
   neighbor).  When a router has multiple alternate candidates for a
   specific destination, it may have direct alternates as well as tunnel
   alternates.  Direct alternates may not always provide an optimal
   routing path and it may be preferable to select a tunnel alternate
   over a direct alternate.

   In figure 1, there is no core alternate for R8 to reach PEs located
   behind R6, so R8 is using PE2 as alternate, which may generate
   congestion when FRR is activated.  Instead, we could have a tunnel
   core alternate for R8 to protect PEs destinations.  For example, a
   tunnel from R8 to R3 may enable to prefer R3 over PE2 as best
   alternate if policy permits.  For example :

   o  tunnel alternates must be prefered over link alternate.


Litkowski, et al.        Expires April 18, 2013                [Page 11]

Internet-Draft              LFA manageability               October 2012


   o  Consider tunnel and link alternate as same level and use another
      criteria to prefer R3 over PE2.


4.  Operational aspects

4.1.  Controlling LFA computation

   LFA computation may be CPU intensive depending on the scope of
   application.  On a network suffer from instability, it may be useful
   to control LFA SPF computation as it is done in most implementations
   for main SPF.

   An implementation MAY allow throttling of LFA computation and LFA
   computation SHOULD have his own throttling values.  The procedures
   for throttling is beyond the scope of this document:

   Consider a stable converged network and main SPF and LFA SPF
   configured with throttling.

   o  T0 : LSP is received advertising a topology change.

   o  T0 : Main SPF is scheduled at t0+x (x depending on throttling
      value of main SPF).

   o  T0+x : main SPF is computed.

   o  T1 : main SPF finishes and LFA computation is scheduled at T1+y (y
      depending on throttling value of LFA computation).

   o  T1+y : LFA SPF computation starts.

   o  T2 : LFA SPF finishes and alternates are computed and installed.

   If a new LSP is received in a short period of time after T2, values
   of x and y may increase based on implementation algorithm of
   throttling.

   It may happen that a network topology change is received while LFA
   SPF is computing or during LFA SPF throttling time.  Priority SHOULD
   be given to main SPF computation compared to LFA computation.  An
   implementation SHOULD abort any running or scheduled LFA computation
   if a topology change is received.

   Moreover when a topology change is received, current programmed
   alternates may not be loopfree anymore, so an implementation MAY drop
   programmed alternates when a topology change is received, and
   recompute and reinstall the alternates again later after completion


Litkowski, et al.        Expires April 18, 2013                [Page 12]

Internet-Draft              LFA manageability               October 2012


   of LFA SPF.

4.2.  Manual triggering of FRR

   Service providers often use using manual link shutdown (using router
   CLI) to perform some network changes.  An implementation MUST support
   triggering/activating LFA Fast Reroute for a given link when a manual
   shutdown is done.

4.3.  Required local information

   LFA introduction requires some enhancement in standard routing
   information provided by implementations.  Moreover, due to the non
   100% coverage, coverage informations also is required.

   Hence an implementation :

   o  MUST be able to display for every destination, the primary nexthop
      as well as the alternate nexthop information

   o  MUST provide coverage information per activation domain of LFA
      (area, level, topology, instance, virtual router ...)

   o  MUST provide total percentage of coverage

   o  SHOULD provide percentage of coverage per link

   o  MAY provide percentage of coverage per priority if implementation
      supports prefix-priority insertion in RIB/FIB

   o  SHOULD provide a reason for chosing an alternate (policy and
      criteria)

   o  MAY provide a mean to clear programmed backup and trigger backup
      recomputation

   o  MAY provide the list of non protected destinations and the reason
      why they are not protected (no protection required or no alternate
      available)

4.4.  Coverage followup

   It is pretty easy to evaluate coverage of a network in a nominal
   situation, but topology changes may change the coverage.  In some
   situations, network may not be able to provide the required level of
   protection.  Hence, it becomes very important for service providers
   to get alerted about changes of coverage.


Litkowski, et al.        Expires April 18, 2013                [Page 13]

Internet-Draft              LFA manageability               October 2012


   An implementation SHOULD :

   o  provide an alert system if total coverage is below a defined
      threshold or comes back to a normal situation.

   o  provide an alert system if coverage of a specific link is below a
      defined threshold or comes back to a normal situation

   An implementation MAY :

   o  provide an alert system if a specific destination is not protected
      anymore or when protection comes back up for this destination

   Although the procedures for providing alerts are beyond the scope of
   this document, we recommend that implementations should consider
   standard and well used mechanisms like syslog or SNMP traps.


5.  Security Considerations

   This document does not introduce any change in security consideration
   compared to [RFC5286].


6.  Acknowledgements


7.  IANA Considerations

   This document has no action for IANA.


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5286]  Atlas, A. and A. Zinin, "Basic Specification for IP Fast
              Reroute: Loop-Free Alternates", RFC 5286, September 2008.

8.2.  Informative References

   [I-D.previdi-isis-te-metric-extensions]
              Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas,
              A., and C. Filsfils, "IS-IS Traffic Engineering (TE)
              Metric Extensions",


Litkowski, et al.        Expires April 18, 2013                [Page 14]

Internet-Draft              LFA manageability               October 2012


              draft-previdi-isis-te-metric-extensions-02 (work in
              progress), October 2012.

   [I-D.shand-remote-lfa]
              Bryant, S., Filsfils, C., Shand, M., and N. So, "Remote
              LFA FRR", draft-shand-remote-lfa-01 (work in progress),
              June 2012.

   [RFC3630]  Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering
              (TE) Extensions to OSPF Version 2", RFC 3630,
              September 2003.

   [RFC3906]  Shen, N. and H. Smit, "Calculating Interior Gateway
              Protocol (IGP) Routes Over Traffic Engineering Tunnels",
              RFC 3906, October 2004.

   [RFC4090]  Pan, P., Swallow, G., and A. Atlas, "Fast Reroute
              Extensions to RSVP-TE for LSP Tunnels", RFC 4090,
              May 2005.

   [RFC5305]  Li, T. and H. Smit, "IS-IS Extensions for Traffic
              Engineering", RFC 5305, October 2008.

   [RFC5714]  Shand, M. and S. Bryant, "IP Fast Reroute Framework",
              RFC 5714, January 2010.

   [RFC5715]  Shand, M. and S. Bryant, "A Framework for Loop-Free
              Convergence", RFC 5715, January 2010.

   [RFC6571]  Filsfils, C., Francois, P., Shand, M., Decraene, B.,
              Uttaro, J., Leymann, N., and M. Horneffer, "Loop-Free
              Alternate (LFA) Applicability in Service Provider (SP)
              Networks", RFC 6571, June 2012.


Authors' Addresses

   Stephane Litkowski
   Orange

   Email: stephane.litkowski@orange.com


   Bruno Decraene
   Orange

   Email: bruno.decraene@orange.com


Litkowski, et al.        Expires April 18, 2013                [Page 15]

Internet-Draft              LFA manageability               October 2012


   Clarence Filsfils
   Cisco Systems

   Email: cfilsfil@cisco.com


   Kamran Raza
   Cisco Systems

   Email: skraza@cisco.com


Litkowski, et al.        Expires April 18, 2013                [Page 16]