Network Working Group                                    Seisho Yasukawa
Category: Informational                                              NTT
Expires: December 2006
                                                           Adrian Farrel
                                                      Old Dog Consulting

                                                               June 2006

       An analysis of scaling issues in MPLS-TE backbone networks

             draft-yasukawa-mpls-scaling-analysis-01.txt

Status of this Memo


   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be
   accessed at http://www.ietf.org/shadow.html.

Abstract

   Traffic engineered Multiprotocol Label Switching (MPLS-TE) is being
   deployed in providers' backbone networks. The providers wish to grow
   these networks, and need to discover whether existing protocols and
   implementations can support the network sizes that they are planning.

   This document presents an analysis of some of the scaling concerns
   for MPLS-TE backbone networks, and examines the value of two
   techniques for improving scaling. The intention is to motivate the
   development of appropriate deployment techniques and protocol
   extensions to enable the applicaiton of MPLS-TE in large networks.

   This document considers only scalability for point-to-point MPLS-TE.
   Point-to-multipoint MPLS-TE is for future study.


Yasukawa and Farrel                                             [Page 1]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

Contents

   1. Introduction ................................................... 2
   2. Network configurations ......................................... 4
   2.1 Commercial drivers for selected configurations ................ 6
   2.2 Other network topologies ...................................... 6
   3. Required network sizes ......................................... 7
   4. Issues of concern for scaling .................................. 7
   4.1 LSP state ..................................................... 7
   4.2 Processing overhead ........................................... 8
   4.3 RSVP-TE implications .......................................... 8
   4.4 Management .................................................... 9
   4.5 Practical Numbers .... ........................................ 9
   5. Scaling in flat networks ...................................... 10
   6. Scaling improvements through Forwarding Adjacencies ........... 11
   6.1 Two layer hierarchy .......................................... 11
   6.1.1 Tuning the network topology to suit the two layer hierarchy. 12
   6.2 Alternative two layer hierarchy .............................. 13
   6.3 Three layer hierarchy ........................................ 13
   6.4 Issues with hierarchical LSPs ................................ 14
   7. Scaling improvements through multipoint-to-point LSPs ......... 15
   7.1 Overview of MP2P LSPs ........................................ 15
   7.2 Scaling improvements ......................................... 16
   7.2.1 Comparison with other scenarios ............................ 17
   7.3 Issues with MP2P LSPs ........................................ 19
   8. Combined models ............................................... 20
   9. Management Considerations ..................................... 20
   10. Security Considerations ...................................... 20
   11. Recommendations .............................................. 21
   12. IANA Considerations .......................................... 21
   13. Acknowledgements ............................................. 21
   14. Intellectual Property Consideration .......................... 21
   15. Normative References ......................................... 22
   16. Informational References ..................................... 22
   17. Authors' Addresses ........................................... 22
   18. Disclaimer of Validity ....................................... 23
   19. Full Copyright Statement ..................................... 23


1. Introduction

   As traffic engineered Multiprotocol Label Switching (MPLS-TE) grows
   in popularity, providers are looking at how they could deploy it in
   networks that are larger than the more simple and experimental
   networks that have been seen so far. This is leading them to examine
   the number of LSPs that may need to be supported at various points
   within the network, and to look at the consequent impact on control
   plane and management plane resources as well as on the constraints
   placed by the physical limitations of the routers in the network.


Yasukawa and Farrel                                             [Page 2]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   Physical topology scaling concerns are addressed by building networks
   that are not fully meshed. Network topologies tend to be meshed in
   the core, but tree-shaped at the edges giving rise to a snow-flake
   design.

   MPLS-TE, however, establishes a logical full mesh between all edge
   points in the network, and this is where the scaling problems arise
   since the tree structure of the network tends to focus a large number
   of LSPs within the core of the network.

   This document presents a generic network topology and introduces
   terminology for the different scaling parameters. It then examines
   how many LSPs might be carried within the core of a network.

   Two techniques (hierarchical and multipoint-to-point LSPs) are
   introduced and an examination is made of the scaling benefits that
   they offer as well as of some of the concerns with using these
   techniques.

   Of necessity, this document makes very many generalizations. Not
   least among these are a set of assumptions about the symmetry and
   connectedness of the physical network. It is hoped that these
   generalizations will not impinge on the usefulness of the overview of
   the scaling properties that this document attmepts to give. In deed,
   the symmetry of the example topologies tends to high-light the
   scaling issues of the different solution models, and this may be
   useful in exposing the worst case scenarios.

   Although protection mechanisms like Fast Re-route (FRR) [RFC4090] are
   briefly discussed, the main body of this document considers stable
   network cases. It should be noted that make-before-break
   re-optimisation after link failure may result in a significant number
   of 'duplicate' LSPs. This issue is not addressed in this document.

   The intention of this document is to help service providers discover
   whether existing protocols and implementations can support the
   network sizes that they are planning. To do this, it presents an
   analysis of some of the scaling concerns for MPLS-TE backbone
   networks, and examines the value of two techniques for improving
   scaling. This should motivate the development of appropriate
   deployment techniques and protocol extensions to enable the
   applicaiton of MPLS-TE in large networks.

   This document considers only scalability for point-to-point MPLS-TE.
   Point-to-multipoint MPLS-TE is for future study.


Yasukawa and Farrel                                             [Page 3]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

2. Network configurations

   The network configurations considered in this document are based on a
   hierarchy of connectivity within the core network. PE nodes have
   connectivity to P nodes as shown in figure 1. There may be
   interconnection between the PEs that are connected to a single P
   node, but there is no direct connectivity between PEs that are
   connected to different P-nodes. Dual homing of PEs to multiple P
   nodes is not considered in this document although it may be a
   valuable addition to a network configuration.

         P
        /|\
       / | \
      /  |  \
     /   |   \
   PE    PE   PE

   Figure 1 : PE to P node connectivity

   The relationship between P-nodes is also structured in a hierarchical
   way. Thus, as shown in figure 2, multiple P-nodes at one level are
   connected to a P-node at a higher level. We number the levels such
   that level 1 is the top level and level (n) is immediately above
   level (n+1), and we denote a P-node at level n as a P(n). There may
   be interconnection between P(n+1) nodes connected to a single P(n),
   but there is no direct connectivity between P(n+1) nodes connected to
   different P(n) nodes. Again, dual homing of P(n+1) nodes to multiple
   P(n) nodes is not considered in this document although it may be a
   valuable addition to a network configuration.

           P(n)
           /|\
          / | \
         /  |  \
        /   |   \
   P(n+1) P(n+1) P(n+1)

   Figure 2 : Relationship between P-nodes


Yasukawa and Farrel                                             [Page 4]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   At the top level, P(1) nodes are connected in a full mesh. In
   reality, the level 1 part of the network may be slightly less well
   connected than this, but assuming a full mesh provides for
   generality.

   The key multipliers for scalability are the number of P(1) nodes, and
   the multiplier relationship between P(n) and P(n+1) at each level
   including down to PEs.

   We define the multiplier M(n) as the number of P(n+1) nodes at level
   (n+1) attached to any one P(n).

   We define S(n) as the number of nodes at level (n).

   Thus:  S(n) = S(1)*M(1)*M(2)*...*M(n-1)

   So the number of PEs can be expressed as:

      S(PE) = S(1)*M(1)*M(2)*...*M(n)

   where the network has (n) layers of P-nodes.

   Thus we may depict an example network as shown in figure 3. In this
   case:
     S(1) = 3
     M(1) = 3
     S(2) = S(1)*M(1) = 9
     M(2) = 2
     S(PE) = S(1)*M(1)*M(2) = 18

     PE      PE  PE     PE  PE      PE
        \      \/         \/       /
     PE--P(2)  P(2)      P(2)  P(2)--PE
             \ |            | /
              \|            |/
    PE--P(2)---P(1)------P(1)---P(2)--PE
       /           \    /           \
     PE             \  /             PE
                     \/
                     P(1)
                     /|\
                    / | \
                   /  |  \
           PE--P(2)  P(2) P(2)--PE
               /      /\      \
             PE     PE  PE     PE

   Figure 3 : An example network


Yasukawa and Farrel                                             [Page 5]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

2.1 Commercial drivers for selected configurations

   It is reasonable to ask why this particular network connectivity
   structure has been chosen.

   The consideration must be physical scalability. Each LSR is only able
   to support a limited number of physical interfaces. This necessarily
   reduces the ability to fully mesh a network and leads to the
   tree-like structure of the network towards the PEs.

   A realistic commercial consideration for an operator is the fact that
   the only revenue-generating nodes in the network are the PEs. Other
   nodes are needed only to support connectivity and scalability.
   Therefore, there is a desire to maximize S(PE) while minimizing the
   sum of S(n) for all values of (n). This could be achieved by
   minimizing the number of levels, and by maximizing the connectivity
   at each layer, M(n). Ultimately, however, this would produce a
   network of just interconnected PEs, which is clearly in conflict with
   the physical scaling situation.

   Therefore, the solution calls for a "few" levels with "relatively
   large" connectivity at each level. We might say that the
   cost-effectiveness of the network can be stated as:

   K = S(PE)/(S(1)+S(2) + ... + S(n)) where n is the level above the PEs

   We should observe, however, that this equation may be naive in that
   the cost of a network is not actually a function of the number of
   routers (since a router chasis is often free or low cost), but is
   really a function of the cost of the line cards which is, itself, a
   product of the capacity of the line cards. Thus, the relatively high
   connectivity decreases the cost-effectiveness, while a topology that
   tends to channel data through a network core tends to demand higher
   capacity (so more expensive) line cards.

   This document examines the implications for control plane and data
   plane scalability of this type of network when MPLS-TE LSPs are used
   to provide full connectivity between all PEs.

2.2 Other network topologies

   As explained in Section 1, this document is using a symmetrical
   network topology for simplicity of modelling. In practice there are
   three other topological considerations.

   a. Ladder networks
      Not all networks have an even (circular) basis. Many are long and
      thin, giving rise to a topology know as a 'ladder network'. That
      is, the core of the network consists of 'parallel' rails
      connecting P nodes, with 'rungs' connecting the P nodes on the
      rails.

   b. Multihoming
      It is relatively common for a node at level (n) to be attached to
      mode than one node at level (n-1). This is particularly common at
      PEs that may be connected to more than one P(n).

   c. Meshing within a level
      A level in the network will often include links between P nodes at
      the same level (for example, links between P(n) nodes.

Yasukawa and Farrel                                             [Page 6]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   Each of these features is likely to have some impact on the scaling
   of the network. However, for the purposes of establishing the ground
   rules for scaling, this document restricts itself to the
   consideration of the symmetrical network described in Section 2.1.
   Discussion of other network formats may be intriduced in a future
   revision of this document.

3. Required network sizes

   An important question for this evaluation and analysis is the size of
   the network that operators require. How many PEs are required? What
   ratio of P to PE is acceptable. How many ports do devices have for
   physical connectivity? What type of MPLS-TE connectivity between PEs
   is required?

   Although presentation of figures for desired network sizes is
   ultimately pointless because history shows that networks grow beyond
   all projections, it is useful to set some acceptable lower bounds.
   That is, we can state that we already know that networks will be at
   least of a certain size.

   The most important features are:

   - The network should have at least 1000 PEs.
   - Each pair of PEs should be connected by at least one LSP in each
     direction.

4. Issues of concern for scaling

   This section presents some of the issues associated with the support
   of LSPs at an LSR or within the network that may mean that there is a
   a limit to the number LSPs that can be supported.

4.1 LSP state

   LSP state is the data (information) that must be stored at an LSR in
   order to maintain an LSP. While the size of the LSP state is
   implementation-dependent, it is clear that any implementation will
   require some data in order to maintain LSP state.

   Thus LSP state becomes a scaling concern because as the number of
   LSPs at an LSR increases, so the amount of memory required to
   maintain the LSPs increases in direct proportion. Since the memory
   capacity of an LSR is limited, there is a related limit placed on the
   number LSPs that can be supported.

   Note that techniques to reduce the memory requirements (such as data
   compression) may serve to increase the number of LSPs that can be
   supported, but this will only achieve a moderate multiplier and may
   significantly decrease the ability to process the state rapidly.

Yasukawa and Farrel                                             [Page 7]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

4.2 Processing overhead

   Depending largely on implementation issues, the number of LSPs
   supported by an LSR may impact the processing speed for each LSP. For
   example, control block search times can increase with the number of
   control blocks, and even excellent implementations cannot completely
   mitigate this fact. Thus, since CPU power is constrained in any LSR,
   there may be a practical limit to the number of LSPs that can be
   supported.

   Further processing overhead considerations depend on issues specific
   to the control plane protocols, and are discussed in the next
   section.

4.3 RSVP-TE implications

   Like many connection-oriented signaling protocols, RSVP-TE requires
   that state is held within the network in order to maintain LSPs. The
   impact of this is described in section 4.1. Note that RSVP-TE
   requires that separate information is maintained for upstream and
   downstream relationships, but does not require any specific
   implementation of that state.

   RSVP-TE is a soft-state protocol which means that protocol messages
   (refresh messages) must be regularly exchanged between signaling
   neighbors in order to maintain the state for each LSP that runs
   between the neighbors. A common period for the transmission (and
   receipt) of refresh messages is 30 seconds meaning that each LSR must
   send and receive one message in each direction (upstream and
   downstream) for every LSP it supports every 30 seconds. This has the
   potential to be a significant constraint on the scaling of the
   network, but various improvements [RFC2961] mean that this refresh
   processing can be significantly reduced allowing an implementation to
   be optimized to nearly remove all concerns about soft state scaling
   in a stable network.

   Observations of existing implementations indicate that there may be a
   threshold of around 50,000 LSPs above which an LSR struggles to
   achieve sufficient processing to maintain LSP state. Although refresh
   reduction [RFC2961] may substantially improve this state of affairs,
   it has also been observed that under these circumstances the size of
   the Srefresh may become very large and the processing required may
   still cause significant disruption to an LSR.

   An alternative approach is to increase the refresh time. There is a
   linear relationship between the percentage increase in refresh time
   and the improvement in performance for the LSR. However, it should be
   noted that RSVP-TE's soft-state nature depends on regular refresh
   messages, thus a degree of functionality is lost by increasing the
   refresh time. This loss may be partially mitigated by the use of the

Yasukawa and Farrel                                             [Page 8]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   RSVP-TE Hello message, and can also be reduced by the use of various
   GMPLS extensions [RFC3473] such as the use of [RFC2961] message
   acknowledgements on all messages.

   RSVP-TE also requires that signaling adjacencies are maintained
   through the use of Hello message exchanges. Although [RFC3209]
   suggests that Hello messages should be retransmitted every 5ms, in
   practice values of around 3 seconds are more common. Nevertheless,
   the support of Hello messages can represent a scaling limitation on
   an RSVP-TE implementation since one message must be sent and received
   to/from each signaling adjacency every time period. This can impose
   limits on the number of neighbors (physical or logical) that an LSR
   supports.

4.4 Management

   Another practical concern for the scalability of large MPLS-TE
   networks is the ability to manage the network. This may be
   constrained by the available tools, the practicality of managing
   large numbers of LSPs, and the management protocols in use.

   Management tools are software implementations. Although such
   implementations should not constrain the control plane protocols, it
   is realistic to appreciate that network deployments will be limited
   by the scalability of the available tools. In practice, most existing
   tools have a limit to the number of LSPs that they can support. While
   an NMS may be able to support a large number of LSPs, the number that
   can be supported by an EMS (or the number supported by an NMS
   per-LSR) is more likely to be limited.

   Similarly, practical constraints may be imposed by the operation of
   management protocols. For example, an LSR may be swamped by
   management protocol requests to read information about the LSPs that
   it supports and this might impact its ability to sustain those LSPs
   in the control plane. OAM, alarms and notifications can further add
   to the burden placed on an LSR and limit the number of LSPs it can
   support.

   All of these consideration encourage a reduction in the number of
   LSPs supported within the network and at any particular LSR.

4.5 Practical Numbers

   In practice, reasonable target numbers are as follows.

   S(PE) >= 1000
   Number of levels is 3. That is: 1, 2 and PE.
   M(2) <= 20
   M(1) <= 20
   S(1) <= 100

Yasukawa and Farrel                                             [Page 9]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

5. Scaling in flat networks

   Before proceeding to examine potential scaling improvements, we need
   to examine how well the flat network described in figure 3 scales.

   Consider the requirement for a full mesh of LSPs linking all PEs.
   That is, each PE has an LSP to and from each other LSP. Thus, if
   there are S(PE) PEs in the network, there are S(PE)*(S(PE) - 1) LSPs.

   Define L(n) as the number of LSPs handled by a level (n) LSR.

   L(PE) = 2*(S(PE) - 1)

   L(2) can be computed as the sum of all LSPs for all attached PEs
   including the LSPs between the attached PEs (but being careful to
   only count those LSPs once). Thus, since the number of attached PEs
   is M(2), we have:

   L(2) = 2*M(2)*(S(PE) - 1) - M(2)*(M(2) - 1)

   L(1) can be computed as the sum of the number of LSPs for all
   downstream PEs less those that can be served exclusively by the
   attached P(2) nodes, and only counting the LSPs routed through two
   attached P(2) nodes once. So:

   L(1) = 2*M(1)*M(2)*(S(PE) - 1) -
          2*M(1)*M(2) -
          M(2)*(M(1) - 1)
        = 2*M(1)*M(2)*[S(PE) - 2.5] + M(2)

   So, for example, with S(1) = 5, M(1) = 10, and M(2) = 20, we see:
      S(PE) = 1000
      L(PE) = 1998
      L(2)  = 39580
      L(1)  = 399020

   Alternatively, with S(1) = 10, M(1) = 10, and M(2) = 20, we see:
      S(PE) = 2000
      L(PE) = 3998
      L(2)  = 79580
      L(1)  = 799020

   In both examples, the number of LSPs at the core (P(1)) nodes is
   probably unacceptably large even though there are only a relatively
   modest number of PEs. In fact, L(2) may even be too large in the
   second example.


Yasukawa and Farrel                                            [Page 10]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

6. Scaling improvements through Forwarding Adjacencies

   One of the purposes of LSP hierarchies [RFC4206] is to improve the
   scaling properties of MPLS-TE networks. LSP tunnels (sometimes known
   as Forwarding Adjacencies (FAs)) may be established to provide
   connectivity over the core of the network and multiple edge-to-edge
   LSPs may be tunneled down a single FA LSP.

   In our network it is natural to consider a mesh of FA LSPs between
   all core nodes at the same level. We consider two possibilities here.
   In the first all P(2) nodes are connected to all other P(2) nodes,
   and the PE-to-PE LSPs are tunneled across the core of the network. In
   the second, an extra layer of hierarchy is introduced by connecting
   all P(1) nodes in a mesh and tunneling the P(2)-to-P(2) tunnels
   through these.

6.1 Two layer hierarchy

   In this hierarchy model, the P(2) nodes are connected by a mesh of
   tunnels. This means that the P(1) nodes do not see the PE-to-PE LSPs.

   It remains the case that:

   L(PE) = 2*(S(PE) - 1)

   L(2) is slightly increased. It can be computed as the sum of all LSPs
   for all attached PEs including the LSPs between the attached PE plus
   the number of FA LSPs providing a mesh to the other P(2) nodes. Thus,
   since the number of attached PEs is M(2) and the number of P(2) nodes
   is S(2), we have:

   L(2) = 2*M(2)*(S(PE) - 1) - M(2)*(M(2) - 1) + 2*(S(2) - 1)

   L(1), however, is significantly reduced and can be computed as the
   sum of the number of FA LSPs to and from each attached P(2) to each
   other P(2) in the network, including (but counting only once) the FA
   LSPs between attached P(2) nodes. So:

   L(1) = 2*M(1)*(S(2) - 1) - M(1)*(M(1) - 1)

   So, for example, with S(1) = 5, M(1) = 10, and M(2) = 20, we see:
      S(PE) = 1000
      S(2)  = 50
      L(PE) = 1998
      L(2)  = 39678
      L(1)  = 890


Yasukawa and Farrel                                            [Page 11]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   Alternatively, with S(1) = 10, M(1) = 10, and M(2) = 20, we see:
      S(PE) = 2000
      S(2)  = 100
      L(PE) = 3998
      L(2)  = 79778
      L(1)  = 1890

   So, in both examples, the problem of the number of LSPs at the core
   (P(1)) nodes is solved, but any problem with L(2) is made slightly
   worse.

6.1.1 Tuning the network topology to suit the two layer hierarchy

   Clearly we can reduce L(2) by selecting appropriate values of S(1),
   M(1) and M(2). We can do this pretty much with immunity since no
   change will affect L(PE), and since L(1) is now so small.

   Observe that:
      L(2) = 2*M(2)*(S(PE) - 1) - M(2)*(M(2) - 1) + 2*(S(2) - 1)
   where S(PE) = S(1)*M(1)*M(2) and S(2) = S(1)*M(1).
   So L(2) scales with M(2)^2 and we can have the most impact by
   reducing M(2).

   For example, with S(1) = 10, M(1) = 10, and M(2) = 10, we see:
      S(PE) = 1000
      S(2)  = 100
      L(PE) = 1998
      L(2)  = 20088
      L(1)  = 1890

   And similarly, with S(1) = 20, M(1) = 20, and M(2) = 5, we see:
      S(PE) = 2000
      S(2)  = 400
      L(PE) = 3998
      L(2)  = 20768
      L(1)  = 15580

   These considerable scaling benefits must be offset against the cost
   effectiveness of the network. Recall that
      K = S(PE)/(S(1)+S(2) ... + S(n))
   where n is the level above the PEs, so that for our network:
      K = S(PE) / (S(1) + S(2))

   Thus, in the first example the cost-effectiveness has been halved
   from 1000/55 to 1000/110. And in the second example it has been
   reduced to roughly one quarter, changing from 2000/110 to 2000/420.

   So, although the tuning changes may be necessary to reach the desired
   network size, they come at a considerable cost to the operator.


Yasukawa and Farrel                                            [Page 12]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

6.2 Alternative two layer hierarchy

   An alternative to the two layer hierarchy presented in section 6.1 is
   to provide a full mesh of FA LSPs between P(1) nodes. This technique
   is only of benefit to any nodes in the core of the level 1 network.
   Otherwise it makes no difference to the PE and P(2) nodes and
   increases the burden at the P(1) nodes since they have to support all
   of the PE-to-PE LSPs as in the flat model, plus the additional
   2*(S(1) - 1) P(1)-to-P(1) FA LSPs. Thus, this approach should only be
   considered where there is a mesh of P-nodes wihtin the ring of P(1)
   nodes, and it is not considered further in this document.

6.3 Three layer hierarchy

   As demonstrated by section 6.2, introducing a mesh of FA LSPs at the
   top level (P(1)) has no benefit, but if we introduce an additional
   level in the network (P(3) between P(2) and PE) we can introduce a
   new layer of FA LSPs so that we have a full mesh of FA LSPs between
   all P(3) nodes to carry the PE-to-PE LSPs, and a full mesh of FA LSPs
   between all P(2) nodes to carry the P(3)-to-P(3) LSPs.

   The math starts to get a little less pretty!

   The number of PEs S(PE) = S(1)*M(1)*M(2)*M(3) and the number of
   PE-to-PE LSPs at a PE remains as L(PE) = 2*(S(PE) - 1).

   The number of LSPs at a P(3) can be deduced from section 6.1. It is
   the sum of all LSPs for all attached PEs including the LSPs between
   the attached PE plus the number of FA LSPs providing a mesh to the
   other P(3) nodes.

   L(3) = 2*M(3)*(S(PE) - 1) - M(3)*(M(3) - 1) + 2*(S(3) - 1)

   The number of LSPs at P(2) can also be deduced from section 6.1 since
   it is the sum of all LSPs for all attached P(3) nodes including the
   LSPs between the attached PE plus the number of FA LSPs providing a
   mesh to the other P(2) nodes.

   L(2) = 2*M(2)*(S(3) - 1) - M(2)*(M(2) - 1) + 2*(S(2) - 1)

   Finally, L(1) can be copied straight from 6.1.

   L(1) = 2*M(1)*(S(2) - 1) - M(1)*(M(1) - 1)


Yasukawa and Farrel                                            [Page 13]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   For example, with S(1) = 5, M(1) = 5, M(2) = 5, and M(3) = 8 we see:
      S(PE) = 1000
      S(3)  = 125
      S(2)  = 25
      L(PE) = 1998
      L(3)  = 16176
      L(2)  = 1268
      L(1)  = 220

   Similarly, with S(1) = 5, M(1) = 5, M(2) = 8, and M(3) = 10 we see:
      S(PE) = 2000
      S(3)  = 200
      S(2)  = 25
      L(PE) = 3998
      L(3)  = 40038
      L(2)  = 3184
      L(1)  = 220

   Of course, the extra level in the network tends to reduce the cost
   effectiveness of the networks with values of K = 1000/155 and
   K = 2000/230 (from 1000/55 and 2000/110) for the examples above.

6.4 Issues with hierarchical LSPs

   A basic observation for hierarchical scaling techniques is that it is
   hard to have any impact on the number of LSPs that must be supported
   by the level of P(n) nodes adjacent to the PEs (for example, it is
   hard to reduce L(3) in section 6.3). In fact, the only way we can
   change the number of LSPs supported by these nodes is to change the
   scaling ratio M(n) in the network, in other words to change the
   number of PEs subtended to any P(n). But such a change has a direct
   effect on the number of PEs in the network and so the
   cost-effectiveness is impacted.

   Another concern with the hierarchical approach is that it must be
   configured and managed. This may not seem like a large burden, but it
   must be recalled that the P(n) nodes are not at the edge of the
   network - they are a set of nodes that must be identified so that the
   FA LSPs can be configured and provisioned. Effectively, the operator
   must plan and construct a layered network with a ring of P(n) nodes
   giving access to the level (n) network. This design activity is open
   to considerable risk as failing to close the ring (i.e. allowing a
   node to be at both level (n+1) and at level (n)) may cause
   operational confusion.

   Protocol techniques (such as IGP auto-mesh [AUTO-MESH]) have been
   developed to reduce the configuration necessary to build this type of
   multi-level network. In the case of auto-mesh, the routing protocol
   is used to advertise the membership of a 'mesh group', and all
   members of the mesh can discover each other and connect with LSP

Yasukawa and Farrel                                            [Page 14]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   tunnels. Thus the P(n) nodes giving access to level (n) can advertise
   their existence to each other, and it is not necessary to configure
   each with information about all of the others. Although this process
   can help to reduce the configuration overhead, it does not eliminate
   it as each member of the mesh group must still be planned and
   configured for membership.

   An important consideration for the use of hierarchical LSPs is how
   they can be protected using MPLS Fast Re-Route (FRR) [RFC4090]. FRR
   may provide link protection either by protecting the tunnels as they
   traverse a broken link, or by treating each level (n) tunnel LSP as a
   link in level (n+1), and providing protection for the level (n+1)
   LSPs (although in this model fault detection and propagation time may
   be an issue). Node protection may be performed in a similar way, but
   protection of the first and last nodes of a hierarchical LSP is
   particularly difficult. Additionally, the whole notion of scaling
   with regard to FRR gives rise to separate concerns that are outside
   the scope of this document as currently formulated.

   Finally, observe that we have been explaining these techniques using
   conveniently symmetrical networks. Consider how we would arrange the
   hierarchical LSPs in a network where some PEs are connected closer to
   the center of the network than others.

7. Scaling improvements through multipoint-to-point LSPs

   An alternative (or complementary) scaling technique has been proposed
   using multipoint-to-point (MP2P) LSPs. The fundamental improvement in
   this case is achieved by reducing the number of LSPs towards the
   destination as LSPs towards the same destination are merged.

   This section presents an overview of MP2P LSPs and describes their
   applicability and scaling benefits.

7.1 Overview of MP2P LSPs

   Note that the MP2P LSPs discussed here are for MPLS-TE and are not
   the same concept familiar in the Label Distribution Protocol (LDP)
   described in [RFC3036].

   Traffic flows generally converge toward their destination and this
   can be utilized by MPLS in constructing an MP2P LSP. With such an
   LSP, the LFIB mappings at each LSR are many-to-one so that multiple
   pairs {incoming interface, incoming label} are mapped to a single
   pair {outgoing interface, outgoing label}. Obviously, if per-platform
   labels are used, this mapping may be optimized within an
   implementation.

   It is important to note that with MP2P MPLS-TE LSPs the traffic flows
   are merged. That is, some additional form of identifier is required

Yasukawa and Farrel                                            [Page 15]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   if demerging is required. For example, if the payload is IP traffic
   belonging to the same client network, no additional demerging
   information is required since the IP packet contains sufficient data.
   On the other hand, if the data comes, for example, from a variety of
   VPN client networks, then the flows will need to be labeled in their
   own right as point-to-point (P2P) flows, so that traffic can be
   disambiguated at the egress of the MP2P LSPs.

   Techniques for establishing MP2P MPLS-TE LSPs and for assigning the
   correct bandwidth downstream of LSP merge points are out of the scope
   of this document.

7.2 Scaling improvements

   Consider the network topology shown in figure 3. Suppose that we
   establish MP2P LSP tunnels such that there is one tunnel terminating
   at each PE, and that that tunnel has every other PE as an ingress.
   Thus, a PE-to-PE MP2P LSP tunnel would have S(PE)-1 ingresses and one
   egress, and there would be S(PE) such tunnels.

   Note that there still remain 2*(S(PE) - 1) PE-to-PE P2P LSPs that are
   carried through these tunnels.

   Let's consider the number of LSPs handled at each node in the
   network.

   The PEs continue to handle the same number of PE-to-PE P2P LSPs, and
   must also handle the MP2P LSPs. So:

   L(PE) = 2*(S(PE) - 1) + S(PE)

   But all P(n) nodes in the network only handle the MP2P LSP tunnels.
   Nominally, this means that L(n) = S(PE) for all values of n, but life
   is not really that simple. The number of LSPs is no longer the only
   issue (although it may have some impact for some of the scaling
   concerns listed in section 4). We are now interested more directly in
   the amount of LSP state that is maintained by an LSR. We can quantify
   this according to the number of LSP segments managed by an LSR. So,
   in the case of a P2P LSP, an ingress or egress has one segment to
   maintain, while a transit has two segments. Similarly, for an MP2P
   LSP, an LSR must maintain one state for each upstream segment (which,
   we can assume, is in a one-to-one relationship with the number of
   upstream neighbors) and exactly one downstream segment - ingresses
   obviously have no upstream neighbors, and egresses have no downstream
   segments.

   So we can start again on our examination of the scaling properties,
   using X(n) to represent the amount of state held at each P(n).


Yasukawa and Farrel                                            [Page 16]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   At the PEs, there is only connectivity to one other network node, the
   P(2) node. But note that if P2P LSPs need to be used to allow
   disambiguation of data at the MP2P LSP egresses, then these P2P LSPs
   are tunneled within the MP2P LSPs . So X(PE) is:

   X(PE) = 2*(S(PE) - 1) if no disambiguation is required

   and

   X(PE) = 4*(S(PE) - 1) if disambiguation is required

   Each P(2) node has M(2) downstream PEs to each of which a single MP2P
   LSP must be delivered coming from the one upstream P(1), and from
   which one LSP segment converges towards each other PE in the network.
   Note that not all MP2P LSPs initiated at subtended PEs are routed out
   through the parent P(1) since some terminate at local PEs. Thus:

   X(2) = 2*M(2) + M(2)*(S(PE) - 1) + S(PE) - M(2)
        = S(PE)*(M(2) + 1)

   Similarly, at each P(1) node there are M(1) downstream P(2) nodes and
   so a total of M(1)*M(2) downstream PEs. Each P(1) is connected in a
   full mesh with the other P(1) nodes so has (S(1) - 1) neighbors. So
   each P(1) must handle an LSP segment coming from each downstream P(2)
   headed to each non-downstream PE and an LSP segment to another P(1)
   for each non-downstream PE. At the same time, it must handle an LSP
   segment coming from each other P(1) headed to each downstream PE, and
   an LSP segment towards each downstream PE. Thus:

   X(1) = (M(1) + 1)*(S(PE) - M(1)*M(2)) +
          (S(1) - 1)*(S(PE) - M(1)*M(2)) + M(1)(M(2)
        = M(1)*(S(PE) + M(2)*(M(1) - S(1) + 1)) +S(1)*S(PE)

   So, for example, with

   For example, with S(1) = 10, M(1) = 10, and M(2) = 10, we see:
      S(PE) = 1000
      S(2)  = 100
      X(PE) = 3996   (or 1998)
      X(2)  = 11000
      X(1)  = 20100

   And similarly, with S(1) = 20, M(1) = 20, and M(2) = 5, we see:
      S(PE) = 2000
      S(2)  = 400
      X(PE) = 5996   (or 2998)
      X(2)  = 12000
      X(1)  = 80100


Yasukawa and Farrel                                            [Page 17]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

7.2.1 Comparison with other scenarios

   For comparison with the examples in sections 5 and 6, we need to
   convert those LSP-based figures to our new measure of LSP state.

   Observe that each LSP in sections 5 and 6 generates two state units
   at a transit LSR and one at an ingress or egress. So we can provide
   conversions as follows:

   Section 5
     L(PE) = 2*(S(PE) - 1)
     L(2)  = 2*M(2)*(S(PE) - 1) - M(2)*(M(2) - 1)
     L(1)  = 2*M(1)*M(2)*[S(PE) - 2.5] - M(2)
     X(PE) = 2*(S(PE) - 1)
     X(2)  = 4*M(2)*(S(PE) - 1) - 2*M(2)*(M(2) - 1)
     X(1)  = 4*M(1)*M(2)*[S(PE) - 2.5] - 2*M(2)

     So that using the example figures in section 7.1 (S(1) = 10,
     M(1) = 10, and M(2) = 10) we see:
       X(PE) = 1998
       X(2)  = 39780
       X(1)  = 398980

     Clearly this technique is a significant improvement over the flat
     network within the network, although the PEs are more heavily
     stressed.

   Section 6.1
     L(PE) = 2*(S(PE) - 1)
     L(2) = 2*M(2)*(S(PE) - 1) - M(2)*(M(2) - 1) + 2*(S(2) - 1)
     L(1) = 2*M(1)*(S(2) - 1) - M(1)*(M(1) - 1)
     S(2) = S(1)*M(1)
     X(PE) = 2*(S(PE) - 1)
     X(2)  = 4*M(2)*(S(PE) - 1) - 2*M(2)*(M(2) - 1) + 2*(S(2) - 1)
     X(1)  = 4*M(1)*(S(2) - 1) - 2*M(1)*(M(1) - 1)

     So that using the example figures in section 7.1 (S(1) = 10,
     M(1) = 10, and M(2) = 10) we see:
       X(PE) = 1998
       X(2)  = 39881
       X(1)  = 3780

     And we can observe that the MP2P model is better at P(2), but the
     hierarchical model is better at P(1).

   In fact, this can be generalized to observe that the MP2P model
   produces best effects towards the edge of the network, while the
   hierarchical model makes most impression at the core. However, the
   requirement for P2P LSPs tunneled within the MP2P LSPs does cause a
   double burden at the PEs.

Yasukawa and Farrel                                            [Page 18]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

7.3 Issues with MP2P LSPs

   The biggest challenges for MP2P LSPs are the provision of support in
   the control and data planes. To some extent support must also be
   provided in the management plane.

   Control plane support is just a matter of defining the protocols and
   procedures [MP2P-RSVP], although it must be clearly understood that
   this will introduce some complexity to the control plane.

   Hardware issues may be a little more tricky. For example, the
   capacity of the upstream segments must never (allowing for
   statistical over-subscription) exceed the capacity of the downstream
   segment. Similarly, data planes must be equipped with sufficient
   buffers to handle incoming packet collisions.

   The management plane will be impacted in several ways. Firstly the
   management applications will need to handle LSPs with multiple
   senders. This means that, although the applications need to process
   fewer LSPs, they will be more complicated and will, in fact, need to
   process the same number of ingresses and egresses. Other issues like
   diagnostics and OAM would also need to be enhanced to support MP2P,
   but might be borrowed heavily from LDP networks.

   Lastly, note that the MP2P solution requires support of twice as many
   LSPs at the PEs and requires tunneling to be introduced at the PEs.
   Since PEs are not usually as fully specified as P-routers, this may
   cause some concern although the use of previous hop popping on the
   MP2P LSPs might help to reduce this issue.

   In all cases, care must be taken not to confuse the reduction in the
   number of LSPs with a reduction in the LSP state that is required. In
   fact, the discussion in section 7.1 is slightly over-optimistic since
   LSP state towards the destination will probably need to include
   sender information and so will increase depending on the number of
   senders for the MP2P LSP. This issue is clearly dependent on the
   protocol solution for MP2P RSVP-TE which is out of scope for this
   document. The calculations do not take this fact into account, and
   assume 'perfect' state reduction.

   MPLS Fast Re-route (FRR) [RFC4090] is an attractive scheme for
   providing rapid local protection from node or link failures. Such a
   scheme has, however, not been designed for MP2P at the time of
   writing, so it remains to be seen how practical it would be,
   especially in the case of the failure of a merge node. Initial
   examination of this case suggests that FRR would not be a problem
   for MP2P given that each flow can be handled separately.

   As a final note, observe that the MP2P scenario presented in this
   section may be optimistic. MP2P LSP merging may be hard to achieve

Yasukawa and Farrel                                            [Page 19]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

   between LSPs with significantly different traffic and QoS parameters.
   Therefore, it may be necessary to increase the number of MP2P LSPs
   arriving at an egress.

8. Combined models

   There is nothing to prevent the combination of hierarchical and MP2P
   solutions within a network.

   Note that if MP2P LSPs are tunneled through P2P FA LSPs across the
   core, none of the benefit of LSP merging is seen for the hops during
   which the MP2P LSPs are tunneled.

   On the other hand, it is possible to construct solutions where MP2P
   FA LSPs are constructed within the network resulting in savings from
   both modes of operation.

9. Management Considerations

   The management issues of the two models presented in this document
   have been discussed in line. Neither solution is without its
   management overhead.

   Note, however, that scalability of management tools is one of the
   motivators for this work and that network scaling solutions that
   reduce the active management of LSPs at the cost of additional effort
   to manage the more static elements of the network represent a
   benefit. That is, it is worth the additional effort to set up MP2P or

   FA LSPs if it means that the network can be scaled to a larger size
   without being constrained by the management tools.

10. Security Considerations

   The techniques described in this document use existing or
   yet-to-be-defined signaling protocol extensions and are subject to
   the security provided by those extensions. Note that we are talking
   about tunneling techniques used within the network and both
   approaches are vulnerable to the creation of bogus tunnels that
   deliver data to an egress or consume network resources.

   The MP2P technique may prove harder to debug through OAM methods than
   the FA LSP approach.


Yasukawa and Farrel                                            [Page 20]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

11. Recommendations

   At this stage, no recommendations are made, but it would be valuable
   to consult more widely to discover:

   - The concerns of other service providers with respect to network
     scalability.
   - More opinions on the realistic constraints to the network
     parameters listed in section 4.
   - Desirable values for the cost-effectiveness of the network
     (parameter K)
   - The applicability, manageability and support for the two techniques
     described
   - The feasibility of combining the two techniques as discussed in
     section 8.

12. IANA Considerations

   This document makes no requests for IANA action.

13. Acknowledgements

   The authors are grateful to Jean-Louis Le Roux for discussions and
   review input. Thanks to Ben Niven-Jenkins for his comments.

14. Intellectual Property Consideration

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights. Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard. Please address the information to the IETF at
   ietf-ipr@ietf.org.


Yasukawa and Farrel                                            [Page 21]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

15. Normative References

   [RFC4206]    Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP)
                Hierarchy with Generalized Multi-Protocol Label
                Switching (GMPLS) Traffic Engineering (TE)", RFC 4206,
                October 2005.

16. Informational References

   [RFC2961]        Berger, L., Gan, D., Swallow, G., Pan, P., Tommasi,
                    F. and S. Molendini, "RSVP Refresh Overhead
                    Reduction Extensions", RFC 2961, April 2001.

   [RFC3036]        Andersson, L., Doolan, P., Feldman, N., Fredette, A.
                    and B. Thomas, "LDP Specification", RFC 3036,
                    January 2001.

   [RFC3209]        Awduche, D., Berger, L., Gan, D., Li, T.,
                    Srinivasan, V. and G. Swallow, "RSVP-TE: Extensions
                    to RSVP for LSP Tunnels", RFC 3209, December 2001.

   [RFC4090]        P. Pan, G. Swallow, A. Atlas (Editors), "Fast
                    Reroute Extensions to RSVP-TE for LSP Tunnels", work
                    in progress.

   [AUTO-MESH]      Vasseur, JP., and Le Roux, JL., "Routing extensions
                    for discovery of Multiprotocol (MPLS) Label Switch
                    Router (LSR) Traffic Engineering (TE) mesh
                    membership", draft-ietf-ccamp-automesh, work in
                    progress.

   [MP2P-RSVP]      Yasukawa, Y., "Supporting Multipoint-to-Point Label
                    Switched Paths in Multiprotocol Label Switching
                    Traffic Engineering",
                    draft-yasukawa-mpls-mp2p-rsvpte, work in progress.

17. Authors' Addresses

   Seisho Yasukawa
   NTT Corporation
   9-11, Midori-Cho 3-Chome
   Musashino-Shi, Tokyo 180-8585 Japan
   Phone: +81 422 59 4769
   EMail: yasukawa.seisho@lab.ntt.co.jp

   Adrian Farrel
   Old Dog Consulting
   EMail:  adrian@olddog.co.uk


Yasukawa and Farrel                                            [Page 22]

draft-yasukawa-mpls-scaling-analysis-01.txt                    June 2006

18. Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

19. Full Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Yasukawa and Farrel                                            [Page 23]