Network Working Group                                          A. Farrel
Internet Draft                                          Juniper Networks
Category: Informational                                          D. King
Expires: 22 August 2013                               Old Dog Consulting
                                                        22 February 2013


   Unanswered Questions in the Path Computation Element Architecture

                 draft-farrkingel-pce-questions-01.txt


Abstract

   The Path Computation Element (PCE) architecture is set out in RFC
   4655. The architecture is extended for multi-layer networking with
   the introduction of the Virtual Network Topology Manager in RFC
   5623, and generalized to Hierarchical PCE in RFC 6805.

   These three architectural views of PCE deliberately leave some key
   questions unanswered especially with respect to the interactions
   between architectural components.  This document draws out those
   questions and discusses them in an architectural context with
   reference to other architectural components, existing protocols, and
   recent IETF work efforts.

   This document does not update the architecture documents and does not
   define how protocols or components must be used.  It does, however,
   suggest how the architectural components might be combined to provide
   advanced PCE function.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html



Farrel and King           Expires August 2013                  [Page 1]

draft-farrkingel-pce-questions-01.txt                     February 2013


Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction .................................................. 3
   1.1.  Terminology ................................................. 3
   2.  What Is Topology Information? ................................. 3
   3.  How Is Topology Information Gathered? ......................... 4
   4.  How Do I Find My PCE? ......................................... 5
   5.  How Do I Select Between PCEs? ................................. 6
   6.  How Do Redundant PCEs Synchronize TEDs? ....................... 7
   7.  Where Is the Destination? ..................................... 8
   8.  Who Runs Or Owns a Parent PCE? ................................ 9
   9.  How Do I Find My Parent PCE? ................................. 10
   10.  How Do I Find My Child PCEs? ................................ 10
   11.  How Is The Parent PCE Domain Topology Built? ................ 10
   12.  Does H-PCE Solve The Internet? .............................. 11
   13.  What are Sticky Resources? .................................. 11
   14.  What Is A Stateful PCE For? ................................. 12
   15.  How Does a Stateful PCE Learn LSP State From The Network? ... 13
   16.  How Do Redundant Stateful PCEs Synchronize State? ........... 14
   17.  What Is An Active PCE? What is a Passive PCE? ............... 14
   18.  What is LSP Delegation? ..................................... 15
   19.  Is An Active PCE with LSP Delegation Just a Fancy NMS? .....  15
   20.  How Does a PCE Work With A Virtual Network Topology? ........ 16
   21.  How Does PCE Communicate With VNTM .......................... 17
   22.  How Does Service Scheduling and Calendering Work? ........... 17
   23.  Where Does Policy Fit In? ................................... 18
   24.  What is a Path Computation Elephant? ........................ 19
   25.  Security Considerations ..................................... 19
   26.  IANA Considerations ......................................... 20
   27.  Acknowledgements ............................................ 20
   28.  References .................................................. 20
   28.1.  Normative References ...................................... 20
   28.2.  Informative References .................................... 20
   Authors' Addresses ............................................... 23

Farrel and King           Expires August 2013                  [Page 2]

draft-farrkingel-pce-questions-01.txt                     February 2013


1.  Introduction

   Over the years since the architecture for the Path Computation
   Element (PCE) was documented in [RFC4655] many new people have
   become involved in the work of the PCE working group and wish to use
   or understand the PCE architecture.  These people often missed out on
   early discussions within the working group and are unfamiliar with
   questions that were raised during the development of the
   documentation.

   Furthermore, the base architecture has been extended to handle other
   situations and requirements.  For example, the architecture was
   extended for multi-layer networking with the introduction of the
   Virtual Network Topology Manager [RFC5623] and was generalized to
   include Hierarchical PCE (H-PCE) [RFC6805].

   These three architectural views of PCE deliberately leave some key
   questions unanswered especially with respect to the interactions
   between architectural components.  This document draws out those
   questions and discusses them in an architectural context with
   reference to other architectural components, existing protocols, and
   recent IETF work efforts.

   This document does not update the architecture documents and does not
   define how protocols or components must be used.  It does, however,
   suggest how the architectural components might be combined to provide
   advanced PCE function.

1.1.  Terminology

   Readers are assumed to be thoroughly familiar with terminology
   defined in defined in [RFC4655], [RFC4726], [RFC5440] and [RFC5623].

   Throughout this document the term "area" is used to refer equally to
   an OSPF area and an IS-IS level.  It is assumed that the reader is
   able to map the small differences between these two use cases.

2.  What Is Topology Information?

   [RFC4655] defines that a PCE performs path computations based on a
   view of the available network resources and network topology.   This
   information is collected into a Traffic Engineering Database (TED).

   However, [RFC4655] does not provide a detailed description of what
   information is present in the TED.  It simply says that the TED
   "contains the topology and resource information of the domain."

   The precise information that needs to be held in a TED depends on the


Farrel and King           Expires August 2013                  [Page 3]

draft-farrkingel-pce-questions-01.txt                     February 2013


   type of network and nature of the computation that has to be
   performed.  As a basic minimum, the TED must contain the nodes and
   links that form the domain, and must identify the connectivity in the
   domain.

   For most traffic engineering needs (for example, MPLS traffic
   engineering - MPLS-TE) the TED would additionally contain a basic
   metric for each link and knowledge of the available (unallocated)
   resources on each link.

   More advanced use cases might require that the TED contains
   additional data that represents qualitative information such as:

   - link delay
   - link jitter
   - node throughput capabilities
   - optical impairments
   - limited node cross-connect capabilities

   Additionally, an important information element for computing paths,
   especially for protected services, is the Shared Risk Group (SRG).
   This is an indication of resources in the TED that have a common
   risk of failure.  That is, they have a shared risk of failure from a
   single event.

   In short, the TED needs to contain as much information as is needed
   to satisfy the path computation requests subject to the objective
   functions.  This, in itself, may not be a trivial issue in some
   network technologies.  For example, in some optical networks, the
   path computation for a new LSP may need to consider the impact that
   turning up a new laser would have on the optical signals already
   being carried by fibers.  It may be possible to abstract this
   information as parameters of the optical links and nodes in the TED,
   but it may be easier to capture this information through a database
   of existing LSPs (see Sections 14 and 15).

3.  How Is Topology Information Gathered?

   Clearly, the information in the TED discussed in Section 2 needs to
   be gathered and maintained some how.  [RFC4655] simple says "The TED
   may be fed by Interior Gateway Protocol (IGP) extensions or
   potentially by other means."  In this context, "fed" means built and
   maintained.

   Thus, one way that the PCE may operate its TED is by participating in
   the IGP running in the network.  In an MPLS-TE network, this would
   depend on OSPF-TE [RFC3630] and IS-IS-TE [RFC5305].  In a GMPLS
   network it would utilize the GMPLS extensions to OSPF and IS-IS


Farrel and King           Expires August 2013                  [Page 4]

draft-farrkingel-pce-questions-01.txt                     February 2013


   [RFC4203] and [RFC5307].

   However, participating in an IGP, even as a passive receiver of IGP
   information, can place a significant load on the PCE.  The IGP can be
   quite "chatty" when there are frequent updates to the use of the
   network meaning that the PCE must dedicate significant processing to
   parsing protocol messages and updating the TED.  Furthermore, to be
   truly useful, a PCE implementation would need to support OSPF and
   IS-IS.

   An alternative feed from the network to PCE's TED is offered by
   BGP-LS [I-D.ietf-idr-ls-distribution].  This approach offer the
   alternative of leveraging an in-network BGP speaker (such as an ASBR
   or a Route Reflector) that already has to participate in the IGP and
   that is specifically designed to apply filters to IGP advertisements.
   In this usage, the BGP speaker filters and aggregates topology
   information according to configured policy before advertising it
   "north-bound" to the PCE to update the TED.  The PCE implementation
   has to support just a simplified subset of BGP rather than two IGPs.

   But BGP might not be convenient in all networks (for example, where
   BGP is not run, such as in an optical network or a BGP-free core).
   Furthermore, not all relevant information is made available through
   standard TE extensions to the IGPs.  In these cases, the TED must be
   built or supplemented from other sources such as the NMS, inventory
   management systems, and directly configured data.

4.  How Do I Find My PCE?

   A Path Computation Client (PCC) needs to know the identity / location
   of a PCE in order to be able to make computation requests.  This is
   because the Path Computation Element Communication Protocol (PCEP) is
   a transaction-based protocol carried over TCP, and the
   architectural decision made in Section 6.4 of RFC 4655 to required
   targeted PCC-PCE communications.

   As described in [RFC4655], a PCC could be configured with the
   knowledge of the IP address of its PCE.  This is a relatively light-
   weight option considering all of the other configuration that a
   router may require, but it is open to configuration errors, and does
   not meet the need for minimal-configuration operation.  Furthermore
   configuration of multiple PCEs could become onerous, while handling
   changes in PCE identities and coping with failure events would be an
   issue for a configured system.

   [RFC4655] offer the possibility that PCEs advertise themselves in the
   IGP, and this requirement is developed in [RFC4674] and made possible
   in OSPF and IS-IS through [RFC5088] and [RFC5089].  In general these


Farrel and King           Expires August 2013                  [Page 5]

draft-farrkingel-pce-questions-01.txt                     February 2013


   mechanisms should be sufficient for PCCs in a network where an IGP is
   used and where the PCE participates in the IGP.

   Note, however, that not all PCEs will participate in the IGP (see
   Section 3).  In these cases, assuming configuration is not
   appropriate as a discovery mechanism, some other server
   announcement/discovery function may be needed, such as DNS
   [RFC4848] as used in the Application-Layer Traffic Optimization
   (ALTO) discovery function [I-D.ietf-alto-server-discovery].

5.  How Do I Select Between PCEs?

   When more then one PCE is discovered or configured, a PCC will need
   to select which PCE to use.  It may make this decision on any
   arbitrary algorithm (for example, first-listed, or round-robin), but
   it may also be the case that different PCEs have different
   capabilities, in which case the PCC will want to select the PCE most
   likely to be able to satisfy any one request.

   PCE advertisement in OSPF or IS-IS per [RFC5088] and [RFC5089] allows
   a PCE to announce its capabilities as required in [RFC4657].  A PCC
   can select between PCEs based on the capabilities that they have
   announced.  However, these capabilities are expressed as flags in the
   PCE advertisement so only the core capabilities are presented, and
   there is not scope for including detailed information (such as
   support for specific objective functions) in the advertisement.

   Additional and more complex PCE capabilities, including the
   capability to perform point-to-multipoint (P2MP) path computations
   [RFC6006], may be announced by the PCE as optional PCEP
   type-length-value (TLV) Type Indicators in the Open message described
   in [RFC5440].  This mechanism is not limited to just a set of
   flags, and detailed capability information may be presented in
   sub-TLVs.

   Note that this exchange of PCE capabilities is in the form of an
   announcement, not a negotiation.  That is, a PCC that wants specific
   function from a PCE must examine the advertised capabilities and
   select which PCE to use for a specific request.  There is no scope
   for a PCC to request a PCE to support features or functions that it
   does not offer or announce.

   A PCC may also vary which PCE it uses according to congestion
   information reported by the PCEs [RFC5440] using the Notification
   Object and Notification Type. Note that in a heavily overloaded PCE
   system, reports from one PCE that it is overloaded may simply
   result in all PCCs switching to another PCE which will, itself,
   immediately become overloaded.  Thus, PCCs should exercise a


Farrel and King           Expires August 2013                  [Page 6]

draft-farrkingel-pce-questions-01.txt                     February 2013


   certain amount of discretion and queueing theory before selecting
   a PCE purely based on reported load.

   Note that a PCC could send all requests to all PCEs that it knows
   about.  It can then elect between the results, perhaps choosing the
   first result it receives, but this approach is very likely to
   overload all the PCEs in the network considering that one of the
   reasons for multiple PCEs is to share the load.

6.  How Do Redundant PCEs Synchronize TEDs?

   A network may have more than one PCE as discussed in the previous
   sections.  These PCEs may provide redundancy for load-sharing,
   resilience, or partitioning of computation features.

   In order to achieve some consistency between the results of different
   PCEs, it is desirable that they operate on the same TE information.

   The TED reflects the actual state of the network and does not a
   resource reservation or booking scheme.  Therefore, a PCE-based
   system does not prevent competition for network resources during the
   provisioning phase, although a process of "sticky resources" that are
   temporarily reduced in the TED after a computation may be applied
   purely as a local implementation issue.

   One option for ensuring that multiple PCEs use the same TE
   information is simply to have the PCEs driven from the same TED.
   This could be achieved in an implementation by utilizing a shared
   database, but it is unlikely to be efficient.

   More likely is that each PCE is responsible for building its own TED
   independently using the techniques described in Section 3.  If the
   PCEs participate in the IGP it is likely that they will attach at
   different points in the network and so there may be minor and
   temporary inconsistencies between their TEDs caused by IGP
   convergence issues.  If the PCEs gather TE information via BGP-LS
   [I-D.ietf-idr-ls-distribution] from different sources, the same
   inconsistencies may arise, but if the PCEs attach to the same BGP
   speaker it may be possible to achieve consistency between TEDs modulo
   the BGP-LS process itself.

   A final option is to provide an explicit synchronization process
   between the TED of a "master" PCE and the TEDs of other PCEs.  Such a
   process could be achieved using BGP-LS or a database synchronization
   protocol (which would allow check-pointing and sequential updates).
   This approach is fraught with issues around selection of the master
   PCE and handling failures.  It is, in fact, a mirrored database
   scenario: a problem that is well known and the subject of plenty of


Farrel and King           Expires August 2013                  [Page 7]

draft-farrkingel-pce-questions-01.txt                     February 2013


   work.

   Noting that the provisioning protocols handle contention for
   resources, that the differences between TEDs are likely to be
   relatively small with moderate arrival rates for new services, and
   that contention in all but the most busy networks is relatively
   unlikely, there may be no value in any attempt to synchronize TEDs.

   However, see Section 16 for a discussion of synchronizing other state
   between redundant PCEs.

7.  Where Is the Destination?

   Path computation provides an end-to-end path between a source and a
   destination.  If the destination lies in the source domain, then its
   location will be known to the PCE and there are no issues to be
   solved.  However, in a multi--domain system a path must be found to
   a remote domain that contains the destination, and that can only be
   achieved by achieving some knowledge of the location of the
   destination or at least knowing the next domain in the path toward
   the domain that contains the destination.

   The simplest solution here is achieved where a PCE has visibility
   into multiple domains.  Such may be the case in a multi-area
   network where the PCE is aware of the contents of all of the IGP
   areas.  This approach is only likely to be appropriate where the
   number of nodes is manageable, and is unlikely to extend over
   administrative boundaries.

   The per-domain path computation method for establishing inter-domain
   traffic engineering LSPs [RFC5152] simply requires a PCE to compute a
   path to the next domain toward the destination.  As the LSP setup
   (through signaling) progresses domain by domain, the label Switching
   Router (LSR) at the entry point to each domain simply requests its
   local PCE to compute the next segment of the path to the next domain
   toward the destination.  Thus, it is not necessary for any PCE
   (except the last) to know in which domain the destination exists.
   But, in this approach, each PCE must somehow determine the next
   domain toward the destination, and it is not obvious how this is
   achieved.

   [RFC5152] suggests that in an IP/MPLS network it is good enough to
   leverage the IP reachability information distributed by BGP and
   assume that TE reachability can follow the same AS path.  This
   approach might not guarantee the optimal TE path and, of course,
   might result in no path being found in degenerate cases.
   Furthermore, in many network technologies (such as optical
   networks operated by GMPLS) there may be limited or no


Farrel and King           Expires August 2013                  [Page 8]

draft-farrkingel-pce-questions-01.txt                     February 2013


   end-to-end IP connectivity.

   The Backward Recursive PCE-based Computation (BRPC) procedure
   [RFC5441] is able to achieve a more optimal end-to-end path than
   the per-domain method, but depends on the knowledge of both the
   domain in which the destination is located and the sequence of
   domains toward the destination.  This information is described in
   [RFC5441] as being known a priori.  Clearly, however, information is
   not known a priori, and it may be hard for the PCE that serves the
   source PCC to discover the necessary details.  While there are
   several approaches to solving the question of establishing the
   domain sequence (for example, BRPC trial and error or Hierarchical
   PCE [RFC6805]) none of them address the issue of determining
   where the destination lies.

   One argument that is often made is that an end-to-end connection
   expressed as an LSP is a feature of a service agreement between
   source and destination.  If that is the case, it is argued, it
   stands to reason that the location of the destination must be
   known to the source node in the same way that the source has
   determined the IP address of the destination.  Presumably this
   would be through a commercial process or an administrative
   protocol.

   [RFC4974] introduced the concept of Calls and Connections (LSPs). A
   Call does not provide the actual connectivity for transmitting
   user traffic, but builds a relationship that will allow subsequent
   Connections to be made.  A Call might be considered an agreement to
   support an end-to-end LSP that is made between the endpoint nodes.
   Call messages are sent and routed as normal IP messages, so the
   sender does not need to know the location of the destination.
   Furthermore, Call requests are responded, and the Call Response can
   carry information (such as the identity of the domain containing the
   destination) for use by Call initiator.  Thus, the use of GMPLS Calls
   might provide a mechanism to discover the location of the
   destination.

8.  Who Runs Or Owns a Parent PCE?

   In the case of multi-domains (e.g., IGP areas or multiple ASes)
   within a single service provider network, the management
   responsibility for the parent PCE would most likely be handled by
   the service provider.

   In the case of multiple ASes within different service provider
   networks, it may be necessary for a third party to manage the parent
   PCEs according to commercial and policy agreements from each of the
   participating service providers.  Note that the H-PCE architecture


Farrel and King           Expires August 2013                  [Page 9]

draft-farrkingel-pce-questions-01.txt                     February 2013


   does not require disclosure of internals of a child domain to the
   parent PCE.  Thus, there is ample scope for a parent PCE to be run by
   one of the connected service providers or by a third party without
   compromising commercial issues.

9.  How Do I Find My Parent PCE?

   [RFC6805] specifies that a child PCE must be configured with the
   address of its parent PCE in order for it to interact with its
   parent PCE.  There is no scope for parent PCEs to advertise their
   presence, however there is potential for directory systems (such as
   DNS [RFC4848] as used in the ALTO discovery function
   [I-D.ietf-alto-server-discovery]) to be used as described in Section
   4.

   Note that according to [RFC6805] the child PCE must also be
   authorized to peer with the parent PCE.  This is discussed from the
   viewpoint of the parent PCE in Section 10.  The child PCE may need to
   participate in a key distribution protocol in order to properly
   authenticate its identity to the parent PCE.

10.  How Do I Find My Child PCEs?

   Within the hierarchical PCE framework [RFC6805] the parent PCE must
   only accept path computation requests from authorized child PCEs.
   If a parent PCE receives requests from an unauthorized child PCE,
   the request should be dropped.

   This would require a parent PCE to be configured with the
   identities and security credentials of all of its child PCEs, or
   there must be some form of shared secret that allows an unknown
   child PCE to be authorized by the parent PCE.

11.  How Is The Parent PCE Domain Topology Built?

   The parent PCE maintains a domain topology map of the child domains
   and their interconnectivity.  Where inter-domain connectivity is
   provided by TE links, the capabilities of those links may also be
   known to the parent PCE.

   The parent PCE maintains a TED for the parent domain in the same way
   that any PCE does.  The nodes in the parent domain will be
   abstractions of the child domains (connected by real or virtual TE
   links), but the parent domain may also include real nodes and links.

   The mechanism for building the parent TED is likely to rely heavily
   on administrative configuration and commercial issues because the
   network was probably partitioned into domains specifically to address


Farrel and King           Expires August 2013                 [Page 10]

draft-farrkingel-pce-questions-01.txt                     February 2013


   these issues.  However, note that in some configurations (for
   example, collections of small optical domains) a separate instance of
   a routing protocol (probably an IGP) may be run within the parent
   domain to advertise the domain interconnectivity.  Additionally,
   since inter-domain TE links can be advertised by the IGPs operating
   in the child domains, this information could be exported to the
   parent PCE either by the child PCEs or using a north-bound export
   mechanism such as BGP-LS [I-D.ietf-idr-ls-distribution].

12.  Does H-PCE Solve The Internet?

   The model described in [RFC6805] introduced a hierarchical
   relationship between domains.  It is applicable to environments with
   small groups of domains where visibility from the ingress LSRs is
   limited.  Applying the hierarchical PCE model to large groups of
   domains such as the Internet, is not considered feasible or
   desirable.

   This does open up a harder question: how many domains can be handled
   by an H-PCE system?  In other words: what is a small group of
   domains?  The answer is not clear and might be "I know it when I see
   it."  At the moment, a rough guide might be around 20 domains as a
   maximum.

   An associated question would be: how many hierachy levels can be
   handled by H-PCE?  Architecturally, the answer is that there is no
   limit, but it is hard to construct practical examples where more than
   two or possibly three layers are needed.

13.  What are Sticky Resources?

   When a PCE computes a path it has a reasonable idea that an LSP will
   be set up and that resources will be allocated within the network.
   If the arrival rate of computation requests is faster than the LSP
   setup rate combined with the IGP convergence time, it is quite
   possible that the PCE will perform its next computation before the
   TED has been updated to reflect the setup of the previous LSP.  This
   can result in LSP setup failures are there is contention for
   resources.  The likelihood of this problem is particularly high
   during recovery from network failures when a large number of LSPs
   might need new paths.

   A PCE may choose to make a provisional assignment of the resources
   that would be needed for an LSP, and reduce the available resources
   in its TED so that the problem is mitigated.  Such resources are
   informally known as "sticky resources".

   Note that using sticky resources introduces a number of other


Farrel and King           Expires August 2013                 [Page 11]

draft-farrkingel-pce-questions-01.txt                     February 2013


   problems that can make managing the TED difficult.  For example:

   - When the TED is updated as a result of new information from the
     IGP, how does the PCE know whether the reduction in available
     resources is due to the successful setup of the LSP for which it
     is holding sticky resources, or for some other network event (such
     as the setup of another LSP)?  This problem may be particularly
     evident if there are multiple PCEs that do not synchronize their
     sticky resources, or if not all LSPs utilize PCE computation.

   - When LSP setup fails, how are the sticky resources?  Since the PCE
     doesn't know about the failure of the LSP setup, it needs some
     other mechanism to release them.

   - What happens if a path computation was made only to investigate the
     potential for an LSP, but not to actually set one up?

   - What if the path used by the LSP does not match that provided by
     the PCE (for example, because the control plane routes around some
     problem)?

   Some of these issues can be mitigated by using a Stateful PCE (see
   Section 14).

14.  What Is A Stateful PCE For?

   A Stateless PCE can perform path computations that take into account
   the existence of other LSPs if the paths of those LSPs are supplied
   on the computation request.  This function can be particularly useful
   when arranging protection paths so that a working and protection LSP
   do not share any links or nodes.  It can also be used when a group of
   LSPs are to be reoptimized at the same time in the process known as
   Global Concurrent Optimization (GCO) [RFC5557]

   However, this mechanism can quite a burden on the protocol messages
   especially when large numbers of LSP paths need to be reported.

   A Stateful PCE maintains a database of LSPs (the LSP-DB) that are
   active in the network, i.e., have been provisioned such that they use
   network resources although they might or might not be carrying
   traffic.  This database allows a PCC to refer to an LSP using only
   its identifier - all other details can be retrieved by the PCE from
   the LSP-DB.

   A Stateful PCE can use the LSP-DB for a many other functions, such as
   balancing the distribution of LSPs in the network.  Furthermore, the
   PCE can correlate LSPs with network resource availability placing new
   LSPs more cleverly.


Farrel and King           Expires August 2013                 [Page 12]

draft-farrkingel-pce-questions-01.txt                     February 2013


   A Stateful PCE that is also an Active PCE (see Section 17) can
   respond to changes in network resource availability and predicted
   demands to reroute LSPs that it knows about.

15.  How Does a Stateful PCE Learn LSP State From The Network?

   A Stateful PCE needs to synchronize its LSP-DB with the state in the
   network.  Just as described in Section 13, the PCE cannot rely on
   knowledge about previous computations it has made, but must find out
   the actual LSPs in the network.

   A simple solution is for all ingress LSRs to report all LSPs to the
   PCE as they are set up, modified, or torn down.  Since PCEP already
   has the facility to fully describe LSP paths and resources in the
   protocol messages, this is not a difficult problem.

   The situation can be more complex, however, there are ingress LSRs
   that do not support PCEP, or that are unaware of the requirement to
   report LSPs to the PCE.  This might happen if the LSRs are able to
   compute paths themselves, or if they receive LSP setup instructions
   from an NMS.

   An alternative approach is to note that any LSR on the path of an LSP
   can probably see the whole path (through the Record Route object in
   RSVP-TE signaling [RFC3209]) and knows the bandwidth reserved for the
   LSP.  Thus, any LSR can report the LSP to the PCE, noting that it
   will not hurt (beyond additional message processing and potential
   overload of the PCE or the network) for the LSP to be reported
   multiple times because it is clearly identified.  In fact this would
   also provide a cross-check mechanism.

   Nevertheless, it is possible that some LSPs will traverse only LSRs
   that are not aware of the PCE's need to learn LSP state and build an
   LSP-DB.  In these cases, the stateful PCE must either only have
   limited knowledge of the LSPs in the network or must learn about LSPs
   through some other mechanism (such as reading the MPLS and GMPLS MIB
   modules [RFC3812] [RFC4802].

   Ultimately, there may be no substitute for all LSRs being aware of
   Stateful PCEs and able to respond to requests to report all LSPs that
   they know about.  This will allow a Stateful PCE to build its LSP-DB
   from scratch (which it may need to do at start of day) and to verify
   its LSP-DB against the network (which may be important if the PCE has
   suffered some form of outage).






Farrel and King           Expires August 2013                 [Page 13]

draft-farrkingel-pce-questions-01.txt                     February 2013


16.  How Do Redundant Stateful PCEs Synchronize State?

   It is important that two PCEs operating in a network have similar
   views of the available resources.  That is, they should have the same
   or substantially similar TEDs.  This is easy to achieve either by
   building the TEDs from the network in the same way, or by one PCE
   synchronizing its TED to the other PCE using a TED export protocol
   such as BGP-LS [I-D.ietf-idr-ls-distribution] or NETCONF [RFC6241].

   Synchronizing the LSP-DB can be a more complicated issue.  As
   described in Section 15, building the LSP-DB can be an involved
   process, so it would be best to not have multiple PCEs each trying
   to build an LSP-DB from the network.  However, it is still important
   that where multiple PCEs operate in the network (either as
   distributed PCEs, or with one acting as a backup for the other) that
   the LSP-DBs are kept synchronized.

   Thus there is likely to be a need for a protocol mechanism for one
   PEC to update its LSP-DB with that of another PCE.  This is no
   different from any other database synchronization problem and could
   use existing mechanisms or a new protocol.  Note, however, that in
   the case of distributed PCEs that are also Active PCEs (see Section
   17), each PCE will be creating entries in its own LSP-DB, so the
   synchronization of databases must be incremental and bidirectional,
   not just simply a database dump.

   It may be helpful to clarify the word "redundant" in the context of
   this question.  One interpretation is that a redundant PCE exists
   solely as a backup such that it is only invoked to perform path
   computation in the event of a failure of the primary PCE.  This seems
   like a shocking waste of expensive resources, and it would make more
   sense for the redundant PCE to take its share of computation load all
   the time.  However, that scenario of two (or more) active PCEs
   creates exactly the state synchromization issued described above.
   Various options have been suggested where one PCE serves a set of
   PCCs as the primary computation server, and only addresses requests
   from other PCCs in the event of the failure of some other PCE, but
   this mode of operation still draws questions about the need for
   synchronized state eben in non-failure scenarios if the LSPs that
   will be computed by the different PCEs may traverse the same network
   resources.

17.  What Is An Active PCE? What is a Passive PCE?

   A Passive PCE is one that only responds to path computation requests.
   It takes no autonomous actions.  A Passive PCE may be stateless or
   stateful.



Farrel and King           Expires August 2013                 [Page 14]

draft-farrkingel-pce-questions-01.txt                     February 2013


   An Active PCE is one that issues provisioning "recommendations" to
   the network.  These recommendations may be new routes for existing
   LSPs, or routes for new LSPs.  An Active PCE may be stateless or
   stateful, but in order that it can reroute existing LSPs effectively,
   it is likely to hold state for at least those LSPs that it will
   reroute.

   Many people consider that the PCE, itself, cannot be Active.  That
   is, they hold that the PCE's function is purely to compute paths.  In
   that world-view, the "Active PCE" is actually the combination of a
   normal, passive PCE and an additional architectural component
   responsible for issuing commands or recommendations to the network.
   In some configurations, the Virtual Network Topology Manager (VNTM)
   discussed in Sections 20 and 21 provides this additional component.

18.  What is LSP Delegation?

   LSP delegation is the process where a PCC (usually an ingress LSR)
   passes responsibility for triggering reoptimization or re-routing of
   an LSP to the PCE.  In this case, the PCE would need to be both
   Stateful and Active.

   LSP delegation allows an LSP to be set up under the control of the
   ingress LSR potentially using the services of a PCE.  Once the LSP
   has been set up, the LSR (a PCC) tells the PCE about the LSP by
   providing details of the path and resources used.  It delegates
   responsibility for the LSP to the PCE so that the PCE can make
   adjustments to the LSP as dictated by changes to the TED and the
   policies in force at the PCE.  The PCE makes the adjustments by
   sending a new path to the LSR with the instruction/recommendation
   that the LSP be re-signalled.

19.  Is An Active PCE with LSP Delegation Just a Fancy NMS?

   In many ways the answer here is "yes".  But the PCE architecture
   forms part of a new way of looking at network operation and
   management.  In this new view, the network operation is more dynamic
   and under the control of software applications without direct
   intervention from operators.  This is not to say that the operator
   has no say in how their network runs, but it does mean that the
   operator sets policies (see Section 23) and that new components (such
   as an Active PCE) are responsible for acting on those policies to
   dynamically control the network.

   There is a subtle distinction between an NMS and an Active PCE with
   LSP delegation.  An NMS is in control of the LSPs in the network and
   can request that they are set up, modified, or torn down.  An Active
   PCE can only make suggestions about LSPs that have been delegated to


Farrel and King           Expires August 2013                 [Page 15]

draft-farrkingel-pce-questions-01.txt                     February 2013


   the PCE by a PCC.

   For more details, see the discussion of an architecture for
   Application-Based Network Operation (ABNO) in
   [I-D.farrkingel-pce-abno-architecture]

20.  How Does a PCE Work With A Virtual Network Topology?

   A Virtual Network Topology (VNT) is described in [RFC4397] as a set
   of Hierarchical LSPs that is created (or could be created) in a
   particular network layer to provide network flexibility (data links)
   in other layers.  Thus the TE topology of a network can be
   constructed from TE links that are simply data links, from TE links
   that are supported by LSPs in another layer or the network, or from
   TE links that could be supported by LSPs ("potential LSPs") that
   would be set up on demand in another network layer.  This third type
   of TE link is known as a Virtual TE Link in [RFC5212].

   [RFC5212] also gives a more detailed explanation of a VNT, and it
   should be noted that the network topology in a packet network could
   be supported by LSPs in a number of different lower-layer networks.
   For example, the TE links in the packet network could be achieved by
   connections (LSPs) in underlying SONET/SDH and photonic networks.
   Furthermore, because of the hierarchical nature of MPLS, the TE links
   in a packet network may be achieved by setting up packet LSPs in the
   same packet network.

   A PCE obviously works with the TED that contains information about
   the TE links in the network.  Those links may be already established
   or may be virtual TE links.  In a simple TED, there is no distinction
   between the types of TE link, however, there may be advantages to
   selecting TE links that are based on real data links over those based
   on dynamic LSPs in lower layers because the data links may be more
   stable.  Conversely, the TE links based on dynamic LSPs may be able
   to be repaired dynamically giving better resilience.  Similarly, a
   PCE may prefer to select a TE link that is supported by a data link
   or existing LSP in preference to using a virtual TE link because the
   latter may need to be set up (taking time) and the setup could
   potentially fail.  Thus, a PCE might want to employ additional
   metrics or indicators to help it view the TED and select the right
   path for LSPs.

   If a PCE uses a virtual TE link, then some action will be needed to
   establish the LSP that supports that link.  Some models (such as that
   in [RFC5212]) trigger the setup of the lower layer LSPs on-demand
   during the signaling of the upper layer LSP (i.e., when the upper
   layer comes to use the virtual TE link, the upper layer signaling is
   paused and the lower layer LSP is established).  Another view,


Farrel and King           Expires August 2013                 [Page 16]

draft-farrkingel-pce-questions-01.txt                     February 2013


   described in [RFC5623], is that when the PCE computes a path that
   will use a virtual TE link, it should trigger the setup of the lower
   layer LSP to properly create the TE link so that the path it returns
   will be sure to be viable.  This latter mode of operation can be
   extended to allow the PCE to spot the need for additional TE links
   and to trigger LSPs in lower layers in order to create those links.

   Of course, such "interference" in a lower layer network by a PCE
   responsible for a higher layer network depends heavily on policy.  In
   order to make a clean architectural separation and to facilitate
   proper policy control, [RFC5623] introduces the Virtual Network
   Topology Manager (VNTM) as a functional element that manages and
   controls the VNT.  [RFC5623] notes that the PCE and VNT Manager are
   distinct functional elements that may or may not be collocated.
   indeed, it should be noted that there will be a PCE for the upper
   layer, and a PCE for each lower layer, and a VNTM responsible for
   coordinating between the PCEs and for triggering LSP setup in the
   lower layers.  Therefore, the combination of all of the PCEs and the
   VNTM produces functionally similar to an Active, multi-layer PCE.

21.  How Does PCE Communicate With VNTM

   The VNTM described in Section 20 and [RFC5623] has several interfaces
   (see also [I-D.farrkingel-pce-abno-architecture]).

   - The VNTM will need to learn about resource shortages and the need
     for additional TE links from the upper layer PCE in order that it
     can make policy-based decisions to determine whether and which LSPs
     to set up to create new TE links.  This interface is currently
     undefined.

   - The VNTM will need to coordinate with the PCEs in the lower layers,
     but this is simply a normal use of PCEP.

   - The VNTM will need to issue provisioning requests/commands to the
     lower layer networks to cause LSPs to be set up to act as TE links
     in the higher layer network.  A number of potential protocols exist
     for this function as described in [I-D.farrkingel-pce-abno-
     architecture], but it should be noted that it makes a lot of sense
     for this interface to be the same as that used by an Active PCE
     when providing paths to the network.

22.  How Does Service Scheduling and Calendering Work?

   LSP scheduling or calendering is a process where LSPs are planned
   ahead of time, and only set up when needed.  The challenge here is to
   ensure that the resources needed by LSP and that were available when
   the LSP's path was computed are still available when the LSP needs to


Farrel and King           Expires August 2013                 [Page 17]

draft-farrkingel-pce-questions-01.txt                     February 2013


   be set up.  This needs to be achieved using a mechanism that allows
   those resources to be used in the mean time.

   Previous discussion of this topic have suggested that LSPs should be
   pre-signaled so that each LSR along the path could make a "temporal
   reservation" of resources.  But this approach can become very
   complicated.

   Conversely, a centralized database of resources and LSPs such as
   maintained by a Stateful PCE can be enhanced with a time-based
   booking system.  If the PCE is also Active, then when the time comes
   for the LSP to be set up (or later, when it is to be torn down) the
   PCE can control the network.

   It should be noted that in a busy network (and why would one bother
   with a scheduling service in a network that is not busy?) the
   computation algorithm can be quite complex.  It may also be necessary
   to reposition existing LSPs as new bookings arrive.  Furthermore, the
   booking database that contains both the scheduled LSPs and their
   impact on the network resources can become quite large.  A very
   important factor in the size of te active database (depending on
   implementation) may be the timeslices that are available in the
   calendering process.

23.  Where Does Policy Fit In?

   Policy is critical to the operation of a network.  In a PCE context
   it provides control and management of how a PCE selects network
   resources for use by different PCEs.

   [RFC5394] introduced the concept of PCE-based policy-enabled path
   computation.  It is based on the Policy Core Information Model (PCIM)
   [RFC3060] as extended by [RFC3460], and provides a framework for
   supporting path computation policy.

   Policy enters into all aspect of the use of a PCE starting from the
   very decision to use a PCE to delegate computation function from the
   LSRs.

   - Each PCC must select which computations will be delegated to a PCE.

   - Each PCC must select which PCEs it will use.

   - Each PCE must determine which PCCs are allowed to use its services
     and for what computations.

   - The PCE must determine how to collect the information in its TED,
     who to trust for that information, and how to refresh/update the


Farrel and King           Expires August 2013                 [Page 18]

draft-farrkingel-pce-questions-01.txt                     February 2013


     information.

   - Each PCE must determine which objective functions and which
     algorithms to apply.

   - Inter-domain (and particularly H-PCE) computations will need to be
     sensitive to commercial and reliability information about domains
     and their interactions.

   - Stateful PCEs must determine what state to hold, when to refresh
     it, and which network elements to trust for the supply of the state
     information.

   - An Active PCE must have a policy relationship with its LSRs to
     determine which LSPs can be modified or triggered, and what LSP
     delegation is supported.

   - Multi-layer interactions (especially those using virtual or dynamic
     TE links) must provide policy control to stop server layer LSPs
     (which are fat and expensive by definition) from being set up on a
     whim to address micro-flows or speculative computations in higher
     layers.

   - A PCE may supply, along with a computed path, policy information
     that should be signaled during LSP setup for use by the LSRs along
     the path.

   It may be seen, therefore, that a PCE is substantially a policy
   engine that computes paths.  It should also be noted that the work of
   the PCE can be substantially controlled by configured policy in a way
   that will reduce the options available to the PCC, but also
   significantly reduce the need for the use of optional parameters in
   the PCEP messages.

24.  What is a Path Computation Elephant?

   A Path Computation Elephant is an attribute of a long document on
   the details of the PCE architecture.  It serves two purposes: the
   first being to check whether the reader is still awake; the second
   being to remember things for the more relaxed reader because, as is
   well known in pachyderm circles, Elephants are stateful and never
   forget.

25.  Security Considerations

   This informational document does not define any new protocol elements
   or mechanism.  As such, it does not introduce any new security
   issues.


Farrel and King           Expires August 2013                 [Page 19]

draft-farrkingel-pce-questions-01.txt                     February 2013


   It is worth noting that PCEP operates over TCP.  An analysis of the
   security issues for routing protocols that use TCP (including PCEP)
   is provided in [I-D.ietf-karp-routing-tcp-analysis].

26.  IANA Considerations

   This document makes no requests for IANA Action.

27.  Acknowledgements

   Thanks for constructive comments go to Fatai Zhang, Oscar Gonzalez de
   Dios, and Xian Zhang.

28.  References

28.1.  Normative References

   [RFC4655]   Farrel, A., Vasseur, J.-P., and J. Ash, "A Path
               Computation Element (PCE)-Based Architecture", RFC
               4655, August 2006.

   [RFC5440]   Vasseur, JP., Ed., and JL. Le Roux, Ed., "Path
               Computation Element (PCE) Communication Protocol
               (PCEP)", RFC 5440, March 2009.

   [RFC5623]   Oki, E., Takeda, T., Le Roux, JL., and A. Farrel,
               "Framework for PCE-Based Inter-Layer MPLS and GMPLS
               Traffic Engineering", RFC 5623, September 2009.

   [RFC6805]   King, D. and A. Farrel, "The Application of the Path
               Computation Element Architecture to the Determination of
               a Sequence of Domains in MPLS and GMPLS", RFC 6805,
               November 2012.

28.2.  Informative References

   [I-D.farrkingel-pce-abno-architecture]
               King, D., and Farrel, A., "A PCE-based Architecture for
               Application-based Network Operations",
               draft-farrkingel-pce-abno-architecture, work in progress.

   [I-D.ietf-alto-server-discovery]
               Kiesel, S., Stiemerling, M., Schwan, N., Scharf, M., and
               H. Song, "ALTO Server Discovery",
               draft-ietf-alto-server-discovery-07 (work in progress),
               January 2013.




Farrel and King           Expires August 2013                 [Page 20]

draft-farrkingel-pce-questions-01.txt                     February 2013


   [I-D.ietf-idr-ls-distribution]
               Gredler, H., Medved, J., Previdi, S., Farrel, A., and
               Ray, S., "North-Bound Distribution of Link-State and TE
               Information using BGP", draft-ietf-idr-ls-distribution,
               work in progress.

   [I-D.ietf-karp-routing-tcp-analysis]
               Jethanandani, M., Patel, K., and Zheng, L., "Analysis of
               BGP, LDP, PCEP and MSDP Issues According to KARP Design
               Guide", draft-ietf-karp-routing-tcp-analysis, work in
               progress.

   [RFC3060]   Moore, B., Ellesson, E., Strassner, J., and A.
               Westerinen, "Policy Core Information Model -- Version 1
               Specification", RFC 3060, February 2001.

   [RFC3209]   Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V,
               and Swallow, G., "RSVP-TE: Extensions to RSVP for LSP
               Tunnels", RFC 3209, December 2001

   [RFC3460]   Moore, B., Ed., "Policy Core Information Model (PCIM)
               Extensions", RFC 3460, January 2003.

   [RFC3630]   Katz, D., Kompella, K., and D. Yeung, "Traffic
               Engineering (TE) Extensions to OSPF Version 2", RFC
               3630, September 2003.

   [RFC3812]   Srinivasan, C., Viswanathan, A., and Nadeau, T.,
               "Multiprotocol Label Switching (MPLS) Traffic Engineering
               (TE) Management Information Base (MIB)", RFC 3812, June
               2004.

   [RFC4203]   Kompella, K., Ed., and Y. Rekhter, Ed., "OSPF
               Extensions in Support of Generalized Multi-
               Protocol Label Switching (GMPLS)", RFC
               4203, October 2005.

   [RFC4397]   Bryskin, I., and Farrel, A. "A Lexicography for the
               Interpretation of Generalized Multiprotocol Label
               Switching (GMPLS) Terminology within the Context of the
               ITU-T's Automatically Switched Optical Network (ASON)
               Architecture", RFC 4397, February 2006.

   [RFC4657]   Ash, J. and J. Le Roux, "Path Computation Element
               (PCE) Communication Protocol Generic Requirements",
               RFC 4657, September 2006.




Farrel and King           Expires August 2013                 [Page 21]

draft-farrkingel-pce-questions-01.txt                     February 2013


   [RFC4674]   Le Roux, J., Ed., "Requirements for Path Computation
               Element (PCE) Discovery", RFC 4674, October 2006.

   [RFC4726]   Farrel, A., Vasseur, J., and A. Ayyangar, "A Framework
               for Inter-Domain Multiprotocol Label Switching Traffic
               Engineering", RFC 4726, November 2006.

   [RFC4802]   Nadeau, T., and Farrel, A., "Generalized Multiprotocol
               Label Switching (GMPLS) Traffic Engineering Management
               Information Base", RFC 4802, February 2007.

   [RFC4848]   Daigle, L., "Domain-Based Application Service Location
               Using URIs and the Dynamic Delegation Discovery Service
               (DDDS)", RFC 4848, April 2007

   [RFC4974]   Papadimitriou, D. and A. Farrel, "Generalized MPLS
               (GMPLS) RSVP-TE Signaling Extensions in Support of
               Calls", RFC 4974, August 2007.

   [RFC5152]   Vasseur, JP., Ed., Ayyangar, A., Ed., and R. Zhang, "A
               Per-Domain Path Computation Method for Establishing
               Inter-Domain Traffic Engineering (TE) Label Switched
               Paths (LSPs)", RFC 5152, February 2008.

   [RFC5088]   Le Roux, JL., Ed., Vasseur, JP., Ed., Ikejiri, Y., and R.
               Zhang, "OSPF Protocol Extensions for Path Computation
               Element (PCE) Discovery", RFC 5088, January 2008.

   [RFC5089]   Le Roux, JL., Ed., Vasseur, JP., Ed., Ikejiri, Y., and R.
               Zhang, "IS-IS Protocol Extensions for Path Computation
               Element (PCE) Discovery", RFC 5089, January 2008.

   [RFC5212]   Shiomoto, K., Papadimitriou, D., Le Roux, JL., Vigoureux,
               M., and Brungard, D., "Requirements for GMPLS-Based
               Multi-Region and Multi-Layer Networks (MRN/MLN)", RFC
               5212, July 2008.

   [RFC5305]   Li, T. and H. Smit, "IS-IS Extensions for Traffic
               Engineering", RFC 5305, October 2008.

   [RFC5307]   Kompella, K., Ed., and Y. Rekhter, Ed., "IS-IS
               Extensions in Support of Generalized Multi-Protocol
               Label Switching (GMPLS)", RFC 5307,
               October 2008.

   [RFC5394]   Bryskin, I., Papadimitriou, D., Berger, L., and J. Ash,
               "Policy-Enabled Path Computation Framework", RFC 5394,
               December 2008.


Farrel and King           Expires August 2013                 [Page 22]

draft-farrkingel-pce-questions-01.txt                     February 2013



   [RFC5441]   Vasseur, J.P., Ed., "A Backward Recursive PCE-based
               Computation (BRPC) procedure to compute shortest inter-
               domain Traffic Engineering Label Switched Paths", RFC
               5441, April 2009.

   [RFC5557]   Lee, Y., Le Roux, JL., King, D., and E. Oki, "Path
               Computation Element Communication Protocol (PCEP)
               Requirements and Protocol Extensions in Support of Global
               Concurrent Optimization", RFC 5557, July 2009.

   [RFC6006]   Zhao, Q., King, D., Verhaeghe, F., Takeda, T., Ali,
               Z., and J. Meuric, "Extensions to the Path
               Computation Element Communication Protocol (PCEP)
               for Point-to-Multipoint Traffic Engineering Label
               Switched Paths", RFC 6006, September 2010.

   [RFC6241]   Enns, R., Bjorklund, M., Schoenwaelder, J., and Bierman,
               A., "Network Configuration Protocol (NETCONF)", RFC 6241,
               June 2011.

Authors' Addresses

   Adrian Farrel
   Juniper Networks
   EMail: adrian@olddog.co.uk

   Daniel King
   Old Dog Consulting
   EMail: daniel@olddog.co.uk




















Farrel and King             Expires June 2013                  [Page 23]