TEAS Working Group Quintin Zhao Internet-Draft Robin Li Intended status: Informational Boris Khasanov Expires: November 1, 2017 Huawei Technologies King Ke Tencent Holdings Ltd. Luyuan Fang Microsoft Chao Zhou Cisco Systems Boris Zhang Telus Communications Artem Rachitskiy Mobile TeleSystems JLLC Anton Gulida LLC "Lifetech" April 28, 2017 The Use Cases for Using PCE as the Central Controller(PCECC) of LSPs draft-ietf-teas-pcecc-use-cases-01 Abstract In certain networks deployment scenarios, service providers would like to keep all the existing MPLS functionalities in both MPLS and GMPLS network while reducing existing complexity.In this document, we propose to use the PCE as a central controller so that LSP can be calculated/signaled/initiated/downloaded/managed through a centralized PCE server to each network devices along the LSP path while leveraging the existing PCE technologies as much as possible. This draft describes the use cases for using the PCE as the central controller where LSPs are calculated/setup/initiated/downloaded/ maintained through extending the current PCE architectures and extending the PCEP. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Zhao, et al. Expires November 1, 2017 [Page 1] Internet-Draft Use Cases for PCECC April 2017 This Internet-Draft will expire on November 1, 2017. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Use Cases of PCECC for Label Resource Reservations . . . . . 3 4. Using PCECC for SR without the IGP Extension . . . . . . . . 4 4.1. Use Cases of PCECC for SR Best Effort(BE) Path . . . . . 5 4.2. Use Cases of PCECC for SR Traffic Engineering (TE) Path . 6 5. Use Cases of PCECC for TE LSP . . . . . . . . . . . . . . . . 7 5.1 Use case of PCECC for load balancing . . . . . . . . . . . 9 5.2 PCECC and Inter-AS TE . . . . . . . . . . . . . . . . . . . 11 6. Use Cases of PCECC for Multicast LSPs . . . . . . . . . . . . 13 6.1 Using PCECC for P2MP/MP2MP LSPs' Setup . . . . . . . . . 13 6.2 Use Cases of PCECC for the Resiliency of P2MP/MP2MP LSPs 14 6.2.1 PCECC for the End-to-End Protection of the P2MP/MP2MP LSPs . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2.2 PCECC for the Local Protection of the P2MP/MP2MP LSPs 14 6.3 Using reliable P2MP TE based multicast delivery for distributed computations (MapReduce-Hadoop). . . . . . 15 7. Use Cases of PCECC for LSP in the Network Migration . . . . . 17 8. Use Cases of PCECC for L3VPN and PWE3 . . . . . . . . . . . . 19 10. Using PCECC for Traffic Classification Informations . . . . . 19 11. The Considerations for PCECC Procedure and PCEP extensions . 20 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 13. Security Considerations . . . . . . . . . . . . . . . . . . . 20 14. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 15.1 Normative References . . . . . . . . . . . . . . . . . . 20 15.2 Informative References . . . . . . . . . . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 Zhao, et al. Expires November 1, 2017 [Page 2] Internet-Draft Use Cases for PCECC April 2017 1. Introduction An Architecture for Use of PCE and PCEP in a Network with Central Control draft [draft-ietf-teas-pce-central-control-01] describes SDN architecture where the Path Computation Element (PCE) determines paths for variety of diferent cases, with PCEP as general southbound protocol and communications with all NEs along the path.. [I-D.zhao-pce-pcep-extension-for-pce-controller] introduces procedures and extensions for PCEP to support this architecture. This draft describes the use cases for this PCECC architecture. 2. Terminology The following terminology is used in this document. IGP: Interior Gateway Protocol. Either of the two routing protocols, Open Shortest Path First (OSPF) or Intermediate System to Intermediate System (IS-IS). PCC: Path Computation Client: any client application requesting a path computation to be performed by a Path Computation Element. PCE: Path Computation Element. An entity (component, application, or network node) that is capable of computing a network path or route based on a network graph and applying computational constraints. TE: Traffic Engineering. Zhao, et al. Expires November 1, 2017 [Page 3] Internet-Draft Use Cases for PCECC April 2017 3. Use Cases of PCECC for Label Resource Reservations PCECC architecture MAY require usage of Global labels to determine and provision of optimal path. That means that global labels range MUST be negotiated between PCECC and NEs and then allocated. This use cas is based on network configuration illustrated using the following figure: +------------------------------+ +------------------------------+ | PCE DOMAIN 1 | | PCE DOMAIN 2 | | +--------+ | | +--------+ | | | | | | | | | | | PCECC1 | ---------PCEP---------- | PCECC2 | | | | | | | | | | | | | | | | | | | +--------+ | | +--------+ | | ^ ^ | | ^ ^ | | / \ PCEP | | PCEP / \ | | V V | | V V | | +--------+ +--------+ | | +--------+ +--------+ | | |NODE 11 | | NODE 1n| | | |NODE 21 | | NODE 2n| | | | | ...... | | | | | | ...... | | | | | PCECC | | PCECC | | | | PCECC | |PCECC | | | |Enabled | | Enabled| | |Enabled | |Enabled | | | +--------+ +--------+ | | +--------+ +--------+ | | | | | +------------------------------+ +------------------------------+ Example 1: Shared Global Label Range Reservation o PCECC Clients nodes report MPLS label capability to the central controller PCECC by usage PCECC-CAPABILITY TLV [draft-zhao-pce-pcep -extension-for-pce-controller p.33]. o The central controller PCECC collects MPLS label capability of all nodes. Then PCECC can calculate the shared MPLS global label range for all the PCECC client nodes. o In the case that the shared global label range need to be negotiated across multiple domains, the central controllers of these domains need to be communicate to negotiate a common global label range (PCLRResv message, [draft-zhao-pce-pcep-extension-for- pce-controller p.35]) . o The central controller PCECC notifies the shared global label range to all PCECC client nodes (PCLRResv message, [draft-zhao-pce- pcep-extension-for-pce-controller p.35]). Zhao, et al. Expires November 1, 2017 [Page 4] Internet-Draft Use Cases for PCECC April 2017 4. Using PCECC for SR without the IGP Extension For the centralized network, the performance achieved through distributed system can not be easy matched if all of the forwarding path is computed, downloaded and maintained by the centralized controller. The performance can be improved by supporting part of the forwarding path in the PCECC network through the segment routing mechanism except that node segment ids and adjacency segment IDs for all the network are allocated dynamically and propagated through the centralized controller instead of using the IGP extension. When the PCECC is used for the distribution of the node segment ID and adjacency segment ID, the node segment ID is allocated from the global label pool. For the allocation of adjacency segment ID, there are two choices, the first choice is that it is allocated from the local label pool, the second choice is that it is allocated from the global label pool. The advantage for the second choice is that the depth of the label stack for the forwarding path encoding will be reduced since adjacency segment ID can signal the forwarding path without adding the node segment ID in front of it. In this version of the draft, we use the fist choice for now. We may update the draft to reflect the use of the second choice. Same as the SR solutions, when PCECC is used as the central controller, the support of FRR on any topology can be pre-computated and setup without any additional signaling (other than the regular IGP/BGP protocols) including the support of shared risk constraints, support of node and link protection and support of microloop avoidance. The following example illustrate the use case where the node segment ID and adjacency segment ID are allocated from the global label allocated for SR path. Zhao, et al. Expires November 1, 2017 [Page 5] Internet-Draft Use Cases for PCECC April 2017 192.0.2.1/32 +----------+ | R1(1001) | +----------+ | +----------+ | R2(1002) | 192.0.2.2/32 +----------+ * | * * * | * * *link1| * * 192.0.2.4/32 * | *link2 * 192.0.2.5/32 +-----------+ 9001| * +-----------+ | R4(1004) | | * | R5(1005) | +-----------+ | * +-----------+ * | *9003 * + * | * * + * | * * + +-----------+ +-----------+ 192.0.2.3/32 | R3(1003) | |R6(1006) |192.0.2.6/32 +-----------+ +-----------+ | +-----------+ | R8(1008) | 192.0.2.8/32 +-----------+ 4.1. Use Cases of PCECC for SR Best Effort(BE) Path In this mode of the solution, the PCECC just need to allocate the node segment ID and adjacency ID without calculating the explicit path for the SR path. The ingress of the forwarding path just need to encapsulate the destination node segment ID on top of the packet. All the intermediate nodes will forward the packet based on the final destination node segment id. It is similar to the LDP LSP forwarding except that label swapping is using the same global label both for the in segment and out segment in each hop. The p2p SR BE path examples are explained as bellow: Note that the node segment id for each node from the shared global labels ranges negotiated already. Example 1: R1 may send a packet to R8 simply by pushing an SR header with segment list {1008}. The path can be: R1-R2-R3-R8 or R1-R2-R5-R8 depending on the route calculation on node R2. Zhao, et al. Expires November 1, 2017 [Page 6] Internet-Draft Use Cases for PCECC April 2017 Example 2: local link/node protection: For the packet which has destination of R3 and after that, R2 may preinstalled the backup forwarding entry to protect the R4 node, the pre-installed the backup path can go through either node5 or link1 or link2 between R2 and R3. The backup path calculation is locally decided by R2 and any existing IP FRR algorithms can be used here. 4.2. Use Cases of PCECC for SR Traffic Engineering (TE) Path In the case of traffic engineering path is needed, the PCECC need to allocate the node segment ID and adjacency ID, and at the same time PCECC calculates the explicit path for the SR path and pass this explicit path represented with a sequence of node segment id and adjacency id. The ingress of the forwarding path need to encapsulate the stack of node segment id and adjacency id on top of the packet. For the case where strict traffic engineering path is needed, all the intermediate nodes and links will be specified through the stack of labels so that the packet is forwarded exactly as it is wanted. Even though it is similar to TE LSP forwarding where forwarding path is engineered, but the Qos is only guaranteed through the enforce of the bandwidth admission control. As for the RSVP-TE LSP case, Qos is guaranteed through the link bandwidth reservation in each hop of the forwarding path. The p2p SR traffic engineering path examples are explained as bellow: Note that the node segment id for each node is allocated from the shared global labels ranges negotiated already and adjacency segment ids for each link are allocated from the local label pool for each node. Example 1: R1 may send a packet P1 to R8 simply by pushing an SR header with segment list {1008}. The path should be: R1-R2-R3-R8. Example 2: R1 may send a packet P2 to R8 by pushing an SR header with segment list {1002, 9001, 1008}. The path should be: R1-R2-(1)link-R3-R8. Example 3: R1 may send a packet P3 to R8 while avoiding the links between R2 and R3 by pushing an SR header with segment list {1004, 1008}. The path should be : R1-R2-R4-R3-R8 Zhao, et al. Expires November 1, 2017 [Page 7] Internet-Draft Use Cases for PCECC April 2017 The p2p local protection examples for SR TE path are explained as below: Example 4: local link protection: o R1 may send a packet P4 to R8 by pushing an SR header with segment list {1002, 9001, 1008}. The path should be: R1-R2-(1)link-R3-R8. o When node R2 receives the packet from R1 which has the header of R2- (1)link-R3-R8, and also find out there is a link failure of link1, then it will send out the packet with header of R3-R8 through link2. Example 5: local node protection: o R1 may send a packet P5 to R8 by pushing an SR header with segment list {1004, 1008}. The path should be : R1-R2-R4-R3-R8. o When node R2 receives the packet from R1 which has the header of {1004, 1008}, and also find out there is a node failure for node4, then it will send out the packet with header of {1005, 1008} to node5 instead of node4. 5. Use Cases of PCECC for TE LSP In the previous sections, we have discussed the cases where the SR path is setup through the PCECC. Although those cases give the simplicity and scalability, but there are existing functionalities for the traffic engineering path such as the bandwidth guarantee through the full forwarding path and the multicast forwarding path which SR based solution cannot solve. Also there are cases where the depth of the label stack may have been an issue for existing deployment and certain vendors. So to address these issues, PCECC architecture should also support the TE LSP and multicast LSP functionalities. To achieve this, the existing PCEP can be used to communicate between the PCE server and PCE's client PCC for exchanging the path request and reply information regarding to the TE LSP info. In this case, the TE LSP info is not only the path info itself, but it includes the full forwarding info. Instead of letting the ingress of LSP to initiate the LSP setup through the RSVP-TE signaling protocol, with minor extensions, we can use the PCEP to download the complete TE LSP forwarding entries for each node in the network. Zhao, et al. Expires November 1, 2017 [Page 8] Internet-Draft Use Cases for PCECC April 2017 192.0.2.1/32 +----------+ | R1(1001) | +----------+ | | 6001|link1 | | 6002|link2 +----------+ | R2(1002) | 192.0.2.2/32 +----------+ link3 * | * * link4 7002 * | * *7001 *link1| * * 192.0.2.4/32 * | *link2 * 192.0.2.5/32 +-----------+ 5001| * +-----------+ | R4(1004) | | * | R5(1005) | +-----------+ | * +-----------+ * | *5003 * + 9001* | * *link1 + * | * *9002 + +-----------+ +-----------+ 192.0.2.3/32 | R3(1003) | |R6(1006) |192.0.2.6/32 +-----------+ +-----------+ | | 3001|link1 | | 3002|link2 +-----------+ | R8(1008) | 192.0.2.8/32 +-----------+ TE LSP Setup Example o Node1 sends a path request message for the setup of TE LSP from R1 to R8 with given constrains, including needed bandwidth. o PCECC calculates the optimal path according to given constrains (i.e.bandwidth) and modifies TEDB with those new parameters, then it negotiates label spaces with R1, R2, R3, R4, R8. o PCECC program each node along the path from R1 to R8 with the primary path: {R1, link1, 6001}, {R2, link3, 7002], {R4, link0, 9001}, {R3, link1, 3001}, {R8}. o For the end to end protection, PCECC program each node along the path from R1 to R8 with the secondary path: {R1, link2, 6002}, {R2, link4, 7001], {R5, link1, 9002}, {R3, link2, 3002}, {R8}. o It is also possible to have a secondary backup path for the local node protection setup by PCECC. For example, the primary path is still same as what we have setup so far, then to protect the node R4 locally, PCECC can program the secondary path like this: {R1, link1, 6001}, {R2, link1, 5001}, {R3, link1, 3001}, {R8}. By doing this, the node R4 is locally protected. Zhao, et al. Expires November 1, 2017 [Page 9] Internet-Draft Use Cases for PCECC April 2017 5.1 PCECC Load Balancing (LB) Use Case Very often many service providers use TE tunnels for solving issues with non-deterministic paths in their networks. One example of such applications is usage of TEs in the mobile backhaul (MBH). Let's consider the following typicall topology. TE1 --------------> +---------+ +--------+ +--------+ +--------+ +------+ +---+ | Access |----| Access |----| AGG 1 |----| AGG N-1|----|Core 1|--|SR1| | SubNode1| | Node 1 | +--------+ +--------+ +------+ +---+ +---------+ +--------+ | | | ^ | | Access | Access | AGG Ring 1 | | | | SubRing 1 | Ring 1 | | | | | +---------+ +--------+ +--------+ | | | | Access | | Access | | AGG 2 | | | | | SubNode2| | Node 2 | +--------+ | | | +---------+ +--------+ | | | | | | | | | | | | | | | +----TE2----|-+ | +---------+ +--------+ +--------+ +--------+ +------+ +---+ | Access | | Access |----| AGG 3 |----| AGG N |----|Core N|--|SRn| | SubNodeN|----| Node N | +--------+ +--------+ +------+ +---+ +---------+ +--------+ This MBH architecture uses L2 access rings and subrings. L3 starts at aggregation. For the sake of simplicity here we have only one access subring,access ring and aggregation ring (AGG1...AGGN), connected by Nx10GE interfaces. Aggregation domain runs its own IGP. There are two Egress routers (AGG N-1,AGG N) that are connected to the Core domain via L2 interfaces. Core also have connections to service routers, RSVP TEs are used for MPLS transport inside the ring. There could be at least 2 tunnels (one way) from each AGG router to egress AGG routers. There are also many L2 access rings connected to AGG routers. Service deployment made by means of either L2VPNs (VPLS) or L3VPNs. Those services use MPLS TE as transport towards egress AGG routers. TE tunnels could be also used as transport towards service routers in case of seamless MPLS based architecture in the future. There is a need to solve the following tasks: o Perform automatic LB amongst TE tunnels according to current traffic load o TE bandwidth (BW) management: Provide guaranteed BW for specific service: HSI,IPTV, etc., provide time-based BW reservation (BoD) o Simplify development of TE tunnels (go away from manual provisioning) o Provide flexibility for Service Router placement (anywhere in the network by creation of transport LSPs to them) Zhao, et al. Expires November 1, 2017 [Page 10] Internet-Draft Use Cases for PCECC April 2017 Since other tasks are considered in other PCECC use cases above, hereafter we will focus only on load balancing (LB) task. LB task could be solved by means of PCECC in the following way: o After application or network service or operator will ask SDN controller (PCECC) for LSP based LB between AGG X and AGG N/AGG N-1 (egress AGG routers which have connections to core) via North Bound Interface (NBI such as REST API), PCECC SHOULD ask for constrains for that particular calculation (i.e. LSP type: traditional CR-LSP or SR-TE LSP, bandwidth, inclusion or exclusion specific links or nodes, number of paths, shortest path or minimum cost tree, need for disjoint LSP paths etc.). o PCECC MUST calculate N P2P LSPs according to given constrains, calculation is based on results of Objective Function (OF), that includes same source and destination routers IDs, same or different bandwidth (BW) , different links (in case of disjoint paths) and other constrains from Step 1. o Depending on given LSP type (CR-LSP or SR-TE), PCECC SHOULD create different labels (aka different label spaces, it MAY also require label space negotiation procedure between PCECC and PCCs [draft-zhao- pce-pcep-extension-for-pce-controller]) for calculated LSPs from egress nodes AGG N-1 and AGG N towards ingress AGG X node. o PCECC SHOULD send PCInitiate PCEP message [I-D.crabbe-pce-pce- initiated-lsp] towards ingress AGG X router(PCC) for each of N LSPs and receives PCRpt PCEP message [I-D.ietf-pce-stateful-pce] back from him. o If LSP type is CR-LSP, PCECC MUST send PCLabelUpd [I-D.zhao-pce-pcep-extension-for-pce-controller] PCEP message to each node along the path with label information for each of N LSPs. If LSP type is SR-TE, PCECC also MUST send PCLabelUpd PCEP message to each node along the path with label information (Node-ID and Adjacency-ID segment (label) list) specific to that node. Then PCECC SHOULD send PCUpd PCEP message to the ingress AGG X router with information about new LSP and AGG X(PCC) SHOULD send PCEP PCRpt back with LSP status:Up. o Now each router along the LSP has corresponding label forwarding state for each of N LSPs. o AGG X as ingress router now have N LSPs towards AGG N and AGG N-1 which are available for installing to router's RIB and LB of traffic between them. Traffic distribution between those LSPs depends on particular realization of hash-function on that router. o Since PCECC MUST know as LSDB as TEDB (TE state) he can manage and prevent possible oversubscriptions and limit number of available LB states. Zhao, et al. Expires November 1, 2017 [Page 11] Internet-Draft Use Cases for PCECC April 2017 5.2 PCECC and Inter-AS TE There are three signalling options for establishing Inter-AS TE LSP: contiguous TE LSP [RFC5151], stitched inter-AS TE LSP [RFC5150], nested TE LSP [RFC4206]. Requirements for PCE-based Inter-AS setup [RFC5376] describe the approach and PCEP fucntionality that are needed for establishing Inter-AS TE LSPs. [RFC5376] also gives Inter- and Intra-AS PCE Reference Model that is provided below in shorten form for the sake of simplicity. Inter-AS Inter-AS PCC <-->PCE1<--------->PCE2 :: :: :: :: :: :: R1----ASBR1====ASBR3---R3---ASBR5 | AS1 | | PCC | | | | AS2 | R2----ASBR2====ASBR4---R4---ASBR6 :: :: :: :: Intra-AS Intra-AS PCE PCE Shorten form of Inter- and Intra-AS PCE Reference Model [RFC5376] Hereatfter we will discuss a simplified Inter-AS case when both AS1 and AS2 belong to the same service provider administration. In that case Inter and Intra-AS PCEs could be combined in one single PCE if such combined PCE performance is enough for handling all Path Computation Requests. Even more in that particular case we potentially could use single PCE for both ASes if his scalability and performance are enough, we just will need interfaces (PCEP and BGP-LS) to both domains. SDN controller's redundancy mechanisms are out of scope in our case. Thus routers in AS1 and AS2 (PCCs) will send Path Computation Requests towards same PCE. Zhao, et al. Expires November 1, 2017 [Page 12] Internet-Draft Use Cases for PCECC April 2017 +----BGP-LS------+ +------BGP-LS-----+ | | | | +-PCEP-|----++-+-------PCECC-----PCEP--++-+-|-------+ +-:------|----::-:-+ +--::-:-|-------:---+ | : | :: : | | :: : | : | | : RR1 :: : | | :: : RR2 : | | v v: : | LSP1 | :: v v | | R1---------ASBR1=======================ASBR3--------R3 | | | v : | | :v | | | +----------ASBR2=======================ASBR4---------+ | | | Region 1 : | | : Region 1 | | |----------------:-| |--:-------------|--| | | v | LSP2 | v | | | +----------ASBR5=======================ASBR6---------+ | | Region 2 | | Region 2 | +------------------+ <--------------> +-------------------+ MPLS Domain 1 Inter-AS MPLS Domain 2 <=======AS1=======> <========AS2=======> Particular case of Inter-AS PCE Reference Model In one particular case of PCECC Inter-AS TE scenario service provider controls both domains (AS1 and AS2), each of them have own IGP and MPLS transport. The need is to setup Inter-AS LSPs for transporting different services on top of them (Voice,L3 VPN etc.) Inter-AS links with different capacity exist in several regions. The task is not only to provision those Inter-AS LSPs with given constrains but also calculate the path and pre-setup the backup Inter-AS LSPs that will be used if main LSP fails. For the figure above it would be that LSP1 from R1 to R3 SHOULD go via ASBR1 and ASBR3, and it is the main Inter-AS LSP. R1-R3 LSP2 that SHOULD go via ASBR5 and ASBR6 is the backup one. Depending on Inter-AS TE type, backup LSP could be used either by head-end R1 or ASBR1. After the addition of PCECC functionality to PCE (SDN controller), PCECC based Inter-AS TE model SHOULD follow as PCECC usecase for TE LSP (case 6 above) as requirements of [RFC5376] with the following details: o Since PCECC MUST know the topology of both domains AS1 and AS2, PCECC MUST establish BGP-LS peering with routers (or RRs) in both domains o PCECC MUST have SBI (PCEP) connectivity towards all routers in both domains (see also section 4 in [RFC5376]) o After operator's application or service orchetsrator will create request for topology of specific service, PCECC SHOULD receive that request via NBI (NBI type is implementation dependent, MAY be NETCONF/Yang, REST etc.). Then PCECC SHOULD calculate Objective Function (OF) for optimal path with given constrains (i.e. LSP type, bandwidth etc.), including those from [RFC5376]: priority, AS sequence, preffered ASBR, disjoint paths, protection. On this step we would have two paths: R1-ASBR1-ASBR3-R3, R1-ASBR5-ASBR6-R3 o Depending on given LSP type (CR-LSP or SR-TE), PCECC SHOULD create different labels (aka different label spaces, it MAY also require label space negotiation procedure between PCECC and PCCs) for calculated LSPs from egress node in one AS towards ingress in another AS. o PCECC SHOULD send PCInitiate PCEP message [I-D.crabbe-pce-pce- initiated-lsp] towards ingress router R1 (PCC) in AS1 and receive PCRpt PCEP message [I-D.ietf-pce-stateful-pce] back from him. o If LSP type is CR-LSP, PCECC MUST send PCLabelUpd [I-D.zhao-pce-pcep-extension-for-pce-controller] PCEP message to each node along the path (ASBR1-ASBR3-R3, ASBR5-ASBR6-R3) in both ASes with label information for that LSP. If LSP type is SR-TE, PCECC also MUST send PCLabelUpd PCEP message to each node along the path in aboth Ases with label information (Node-ID and Adjacency-ID segment (label) list) specific to that node. o Then PCECC SHOULD send PCUpd PCEP message to the ingress router R1 in AS1 with information about new LSP and the R1 router SHOULD send PCEP PCRpt back with LSP1 and LSP2 status:Up. o After that step R1 SHOULD have main and backup TEs (LSP1 and LSP2) towards R3 up. It is up to implementation how to put this TEs to R1's RIB and how to make switchover to backup LSP2 if LSP1 fails. Zhao, et al. Expires November 1, 2017 [Page 13] Internet-Draft Use Cases for PCECC April 2017 6. Use Cases of PCECC for Multicast LSPs The current multicast LSPs are setup either using the RSVP-TE P2MP or mLDP protocols. The setup of these LSPs not only need a lot of manual configurations, but also it is also complex when the protection is considered. By using the PCECC solution, the multicast LSP can be computed and setup through centralized controller which has the full picture of the topology and bandwidth usage for each link. It not only reduces the complex configurations comparing the distributed RSVP-TE P2MP or mLDP signal lings, but also it can compute the disjoint primary path and secondary path efficiently. 6.1. Using PCECC for P2MP/MP2MP LSPs' Setup With the capability of global label and local label existing at the same time in the PCECC network, PCECC will use compute, setup and maintain the P2MP and MP2MP lsp using the local label range for each network nodes. +----------+ | R1 | Root node of the multicast LSP +----------+ |6000 +----------+ Transit Node | R2 | +----------+ * | * * 9001* | * *9002 * | * * +-----------+ | * +-----------+ | R4 | | * | R5 | Transit Nodes +-----------+ | * +-----------+ * | * * + 9003* | * * +9004 * | * * + +-----------+ +-----------+ | R3 | | R5 | Leaf Node +-----------+ +-----------+ 9005| +-----------+ | R8 | Leaf Node +-----------+ The P2MP examples are explained here: Step1: R1 may send a packet P1 to R2 simply by pushing an label of 6000 to the packet. Step2: After R2 receives the packet with label 6000, it will forwarding to R4 by pushing header of 9001 and R5 by pusing header of 9002. Step3: After R4 receives the packet with label 9001, it will forwarding to R3 by pushing header of 9003. After R5 receives the packet with label 9002, it will forwarding to R5 by pushing header of 9004. Step3: After R3 receives the packet with label 9003, it will forwarding to R8 by pushing header of 9005 Zhao, et al. Expires November 1, 2017 [Page 14] Internet-Draft Use Cases for PCECC April 2017 6.2. Use Cases of PCECC for the Resiliency of P2MP/MP2MP LSPs 6.2.1 PCECC for the End-to-End Protection of the P2MP/MP2MP LSPs In this section we describe the end-end managed path protection service and the local protection with the operation management in the PCECC network for the P2MP/MP2MP LSP, which includes both the RSVP-TE P2MP based LSP and also the mLDP based LSP. An end-to-end protection (for nodes and links) principle can be applied for computing backup P2MP or MP2MP LSPs. During computation of the primarily multicast trees, PCECC server may also be taken into consideration to compute a secondary tree. A PCE may compute the primary and backup P2MP or MP2Mp LSP together or sequentially. +----+ +----+ Root node of LSP | R1 |--| R11| +----+ +----+ / + 10/ +20 / + +----------+ +-----------+ Transit Node | R2 | | R3 | +----------+ +-----------+ | \ + + | \ + + 10| 10\ +20 20+ | \ + + | \ + | + \ + +-----------+ +-----------+ Leaf Nodes | R4 | | R5 | (Downstream LSR) +-----------+ +-----------+ In the example above, when the PCECC setup the primary multicast tree from the root node R1 to the leafs, which is R1->R2->{R4, R5}, at same time, it can setup the backup tree, which is R11->R3->{R4, R5}. Both the these two primary forwarding tree and secondary forwarding tree will be downloaded to each routers along the primary path and the secondary path. The traffic will be forwarded through the R1->R2->{R4, R5} path normally, and when there is a node in the primary tree, then the root node R1 will switch the flow to the backup tree, which is R11->R3->{R4, R5}. By using the PCECC, the path computation and forwarding path downloading can all be done without the complex signaling used in the P2MP RSVP-TE or mLDP. 6.2.2 PCECC for the Local Protection of the P2MP/MP2MP LSPs In this section we describe the local protection service in the PCECC network for the P2MP/MP2MP LSP. While the PCECC sets up the primary multicast tree, it can also build the back LSP among PLR, the protected node, and MPs (the downstream nodes of the protected node). In the cases where the amount of downstream nodes are huge, this mechanism can avoid unnecessary packet duplication on PLR, so that protect the network from traffic congestion risk. Zhao, et al. Expires November 1, 2017 [Page 15] Internet-Draft Use Cases for PCECC April 2017 +------------+ | R1 | Root Node +------------+ . . . +------------+ Point of Local Repair/ | R10 | Switchover Point +------------+ (Upstream LSR) / + 10/ +20 / + +----------+ +-----------+ Protected Node | R20 | | R30 | +----------+ +-----------+ | \ + + | \ + + 10| 10\ +20 20+ | \ + + | \ + | + \ + +-----------+ +-----------+ Merge Point | R40 | | R50 | (Downstream LSR) +-----------+ +-----------+ . . . . In the example above, when the PCECC setup the primary multicast path around the PLR node R10 to protect node R20, which is R10->R20->{R40, R50}, at same time, it can setup the backup path R10->R30->{R40, R50}. Both the these two primary forwarding path and secondary forwarding path will be downloaded to each routers along the primary path and the secondary path. The traffic will be forwarded through the R10->R20->{R40, R50} path normally, and when there is a node failure for node R20, then the PLR node R10 will switch the flow to the backup path, which is R10->R30->{R40, R50}. By using the PCECC, the path computation and forwarding path downloading can all be done without the complex signaling used in the P2MP RSVP-TE or mLDP. 6.3 Using reliable P2MP TE based multicast delivery for distributed computations (MapReduce-Hadoop) MapReduce model of distributed computations in computing clusters is widely deployed. In Hadoop 1.0 architecture MapReduce operations on big data performs by means of Master-Slave architecture in the Hadoop Distributed File System (HDFS),where NameNode has the knowledge about resources of the cluster and where actual data (chunks) for particular task are located (which DataNode). Each chunk of data (64MB or more) should have 3 saved copies in different DataNodes based on their proximity. Proximity level currently has semi-manual allocation and based on Rack IDs (Assumption is that closer data are better because of access speed/smaller latency). JobTracker node is responsible for computation tasks, scheduling across DataNodes and also have Rack-awareness. Currently transport protocols between NameNode/JobTracker and DataNodes are based on IP unicast. It has simplicity as pros but has numerous drawbacks related with its flat approach. It is clear that we should go beyond of one DC for Hadoop cluster creation and move towards distributed clusters. In that case we need to handle performance and latency issues. Latency depends on speed of light in fiber links and also latency introduced by intermediate devices in between. The last one is closely correlated with network device architecture and performance. Current performance of NPU based routers should be enough for creating distribute Hadoop clusters with predicted latency. Performance of SW based routers (mainly as VNF) together with additional HW features such as DPDK are promising but require additional research and testing. Zhao, et al. Expires November 1, 2017 [Page 16] Internet-Draft Use Cases for PCECC April 2017 Main question is how can we create simple but effective architecture for distributed Hadoop cluster? There are number of researches [Multicast Tree Map-Reduce...] which show how usage of multicast tree could improve speed of resource or cluster members discovery inside the cluster as well as increase redundancy in communications between cluster nodes. Is traditional IP based multicast enough for that? We doubt it because it requires additional control plane (IGMP, PIM) and a lot of signaling, that is not suitable for high performance computations, that are very sensitive to latency. P2MP TE tunnels looks much more suitable as potential solution for creation of multicast based communications between Master and Slave nodes inside cluster. Obviously these P2MP tunnels should be dynamically created and turned down (no manual intervention). Here is there PCECC comes to play. His main task is to create optimal topology of each partucular request for MapReduce computation and also create P2MP tunnels with needed parameters such as badnwidth and delay. This solution would require to use MPLS label based forwarding inside the cluster. Usage of label based forwarding inside DC was proposed by Yandex [MPLS in DC...] Technically it is already possible because mpls on switches is already supported by some vendors, mpls aslo exists on Linux and OVS. The following framework can make this task: +--------+ | APP | +--------+ | NBI (REST API,...) | PCEP +----------+ REST API +---------+ +---| PCECC |----------+ | Client |---|---| | | +---------+ | +----------+ | | | | | | | +-----|---+ |PCEP| | +--------+ | | | | | | | | | | | | REST API | | | | | | | | | | | +-------------+ | | | | +----------+ | Job Tracker | | | | | | NameNode | | | | | | | | | +-------------+ | | | | +----------+ +------------------+ | +-----------+ | | | | |---+-----P2MP TE--+-----|-----------| | +----------+ +----------+ +----------+ | DataNode1| | DataNode2| | DataNodeN| |TaskTraker| |TaskTraker| .... |TaskTraker| +----------+ +----------+ +----------+ Communication between Master nodes (JobTracker and NameNode) and PCECC via REST API MAY be either done directly or via cluster manager such as Mesos. Phase 1: Distributed cluster resources discovery During this phase Master Nodes SHOULD identify and find available Slave nodes according to computing request from application (APP). NameNode SHOULD query PCECC about available DataNodes, NameNode MAY provide additional constrains to PCECC such as topological proximity, redundancy level. Zhao, et al. Expires November 1, 2017 [Page 17] Internet-Draft Use Cases for PCECC April 2017 PCECC SHOULD analyze the topology of distributed cluster and perform constrain based path calculation [RFC7334] from client towards most suitable NameNodes. PCECC SHOULD reply to NameNode the list of most suitable DataNodes and their resource capabilities. Topology discovery mechanism for PCECC will be added later to that framework. Phase 2: PCECC SHOULD create P2MP LSP from client towards those DataNodes by means of PCLabelUpd [I-D.zhao-pce-pcep-extension-for-pce -controller] PCEP messages following previously calculated path. Phase 3. NameNode SHOULD send this information to client, PCECC informs client about optimal P2MP path towards DataNodes via PCEP PCUpd message. Phase 4. Client sends data blocks to those DataNodes for writing via created P2MP tunnel. When this task will be finished, P2MP tunnel MAY be turned down. 7. Use Cases of PCECC for LSP in the Network Migration One of the main advantages for PCECC solution is that it has backward compatibility naturally since the PCE server itself can function as a proxy node of MPLS network for all the new nodes which don't support the existing MPLS signaling protocol anymore. As it is illustrated in the following example, the current network will migrate to a total PCECC controlled network gradually by replacing the legacy nodes. During the migration, the legacy nodes still need to signal using the existing MPLS protocol such as LDP and RSVP-TE, and the new nodes setup their portion of the forwarding path through PCECC directly. With the PCECC function as the proxy of these new nodes, MPLS signaling can populate through network as normal. Example described in this section is based on network configurations illustrated using the following figure: Zhao, et al. Expires November 1, 2017 [Page 18] Internet-Draft Use Cases for PCECC April 2017 +------------------------------------------------------------------+ | PCE DOMAIN | | +-----------------------------------------------------+ | | | PCECC | | | +-----------------------------------------------------+ | | ^ ^ ^ ^ | | | PCEP | | PCEP | | | V V V V | | +--------+ +--------+ +--------+ +--------+ +--------+ | | | NODE 1 | | NODE 2 | | NODE 3 | | NODE 4 | | NODE 5 | | | | |...| |...| |...| |...| | | | | Legacy |if1| Legacy |if2|Legacy |if3| PCECC |if4| PCECC | | | | Node | | Node | |Enabled | |Enabled | | Enabled| | | +--------+ +--------+ +--------+ +--------+ +--------+ | | | +------------------------------------------------------------------+ Example: PCECC Initiated LSP Setup In the Network Migration In this example, there are five nodes for the TE LSP from head end (Node1) to the tail end (Node5). Where the Node4 and Node5 are centrally controlled and other nodes are legacy nodes. o Node1 sends a path request message for the setup of LSP destinating to Node5. o PCECC sends to node1 a reply message for LSP setup with the path: (Node1, if1),(Node2, if2), (Node3, if3), (Node4, if4), Node5. o Node1, Node2, Node3 will setup the LSP to Node5 using the local labels as usual. o Then the PCECC will program the outsegment of Node3, the insegment/ ousegment of Node4, and the insegment for Node5. Zhao, et al. Expires November 1, 2017 [Page 19] Internet-Draft Use Cases for PCECC April 2017 8. Use Cases of PCECC for L3VPN and PWE3 The existing services using MPLS LSP tunnels based on MPLS signalling mechanism such L3VPN, PWE3 and IPv6 can be simplified by using the PCECC to negoitate the label assignments for the L3VPN, PWE3 and Ipv6. In the case of L3VPN, VPN labels can be negotiated and distributed through the PCECC PCEP among the PE router instead of using the BGP protocols. Example described in this section is based on network configurations illustrated using the following figure: +-------------------------------------------+ | PCE DOMAIN | | +-----------------------------------+ | | | PCECC | | | +-----------------------------------+ | | ^ ^ ^ | |PWE3/L3VPN | PCEP PCEP|LSP PWE3/L3VPN|PCEP | | V V V | +--------+ | +--------+ +--------+ +--------+ | +--------+ | CE | | | PE1 | | NODE x | | PE2 | | | CE | | |...... | |...| |...| |.....| | | Legacy | |if1 | PCECC |if2|PCCEC |if3| PCECC |if4 | Legacy | | Node | | | Enabled| |Enabled | |Enabled | | | Node | +--------+ | +--------+ +--------+ +--------+ | +--------+ | | +-------------------------------------------+ Example: Using PCECC for L3VPN and PWE3 In the cast PWE3, instead of using the LDP signalling protocols, the lable and port pairs assigned to each pseudowire can be negotiated through PCECC among the PE rotuers and the corresponding forwarding entries will be distributed into each PE routers through the extended PCEP protocols. 10. Using PCECC for Traffic Classification Information When a TE-LSP is set up, the head end needs to know: o how to use it o What traffic to send on the LSP o Whether it is a virtual link o Whether to advertise it in the IGP o What bits of this information to signal to the tail end PCEP allows an Active PCE to set up or modify LSPs. But we have no way to tell the head end how to use the LSP. This is because of history. It used to be the LER that made the request of the PCE, so it knew why it wanted the LSP. With the PCECC architecture by extending the PCEP protocols, it is easy to carry this information such as how to use the LSP, how to advertise the LSP and other extra signaling information. Zhao, et al. Expires November 1, 2017 [Page 20] Internet-Draft Use Cases for PCECC April 2017 11. The Considerations for PCECC Procedure and PCEP extensions The PCECC's procedures and PCEP extensions is defined in [I-D.zhao- pce-pcep-extension-for-pce-controller]. 12. IANA Considerations This document does not require any action from IANA. 13. Security Considerations TBD. 14. Acknowledgments We would like to thank Adrain Farrel, Aijun Wang, Dhruv Dhody, Robert Tao, Changjiang Yan, Tieying Huang, Sergio Belotti, Dieter Beller, Andrey Elperin and Evgeniy Brodskiy for their useful comments and suggestions. 15. References 15.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, April 1997, . [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation Element (PCE) Communication Protocol (PCEP)", RFC 5440, DOI 10.17487/RFC5440, April 2009, . Zhao, et al. Expires November 1, 2017 [Page 21] Internet-Draft Use Cases for PCECC April 2017 15.2. Informative References [I-D.teas-pce-central-control] A. Farrel, Q. Zhao, R. Li, C. Zhou "An Architecture for Use of PCE and PCEP in a Network with Central Control", draft-ietf-teas-pce-central-control-01 (work in progress), December 2016. [RFC5441] Vasseur, JP., Ed., Zhang, R., Bitar, N., and JL. Le Roux, "A Backward-Recursive PCE-Based Computation (BRPC) Procedure to Compute Shortest Constrained Inter-Domain Traffic Engineering Label Switched Paths", RFC 5441, DOI 10.17487/RFC5441, April 2009, . [RFC5541] Le Roux, JL., Vasseur, JP., and Y. Lee, "Encoding of Objective Functions in the Path Computation Element Communication Protocol (PCEP)", RFC 5541, DOI 10.17487/RFC5541, June 2009, . [RFC5376] N. Bitar, R. Zhang, K. Kumaki "Inter-AS Requirements for the Path Computation Element Communication Protocol (PCECP)", RFC 5376, DOI 10.17487/RFC5376, November 2008 . [I-D.filsfils-spring-segment-routing] Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, "Segment Routing Architecture", draft-filsfils-spring- segment-routing-04 (work in progress), July 2014. [I-D.ietf-pce-stateful-pce] Crabbe, E., Minei, I., Medved, J., and R. Varga, "PCEP Extensions for Stateful PCE", draft-ietf-pce-stateful- pce-14 (work in progress), May 2016. [I-D.crabbe-pce-pce-initiated-lsp] Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "PCEP Extensions for PCE-initiated LSP Setup in a Stateful PCE Model", draft-crabbe-pce-pce-initiated-lsp-05 (work in progress), October 2015. [I-D.ali-pce-remote-initiated-gmpls-lsp] Ali, Z., Sivabalan, S., Filsfils, C., Varga, R., Lopez, V., Dios, O., and X. Zhang, "Path Computation Element Communication Protocol (PCEP) Extensions for remote- initiated GMPLS LSP Setup", draft-ali-pce-remote- initiated-gmpls-lsp-03 (work in progress), February 2014. [I-D.ietf-isis-segment-routing-extensions] Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., Litkowski, S., Decraene, B., and J. Tantsura, "IS-IS Extensions for Segment Routing", draft-ietf-isis-segment- routing-extensions-06 (work in progress), December 2015. Zhao, et al. Expires November 1, 2017 [Page 22] Internet-Draft Use Cases for PCECC April 2017 [I-D.psenak-ospf-segment-routing-extensions] Psenak, P., Previdi, S., Filsfils, C., Gredler, H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF Extensions for Segment Routing", draft-psenak-ospf- segment-routing-extensions-05 (work in progress), June 2014. [I-D.sivabalan-pce-segment-routing] Sivabalan, S., Medved, J., Filsfils, C., Crabbe, E., Raszuk, R., Lopez, V., and J. Tantsura, "PCEP Extensions for Segment Routing", draft-sivabalan-pce-segment- routing-03 (work in progress), July 2014. [I-D.li-mpls-global-label-usecases] Li, Z., Zhao, Q., Yang, T., Raszuk, R., and L. Fang, "Usecases of MPLS Global Label", draft-li-mpls-global- label-usecases-03 (work in progress), October 2015. [I-D.li-mpls-global-label-framework] Li, Z., Zhao, Q., Chen, X., Yang, T., and R. Raszuk, "A Framework of MPLS Global Label", draft-li-mpls-global- label-framework-02 (work in progress), July 2014. [I-D.zhao-pce-pcep-extension-for-pce-controller] Zhao, Q., Li, Z., Dhody, D., and C. Zhou, "PCEP Procedures and Protocol Extensions for Using PCE as a Central Controller (PCECC) of LSPs", draft-zhao-pce-pcep- extension-for-pce-controller-03 (work in progress), April 2016. [I-D.ietf-spring-resiliency-use-cases] Francois, P., Filsfils, C., Decraene, B., and R. Shakir, "Use-cases for Resiliency in SPRING", draft-ietf-spring- resiliency-use-cases-02 (work in progress), December 2015. [MPLS in DC...] Afanasiev, D., Ginsburg, D., "MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale " [Multicast Tree Map-Reduce...] Lee, Kyungyong., Dr. Boykin, P. Oscar., Dr.Figueiredo, Renato J., "Multicast Tree Map-Reduce: Self-organizing Resource Discovery and Monitoring using Structured P2P Systems" Authors' Addresses Quintin Zhao Huawei Technologies 125 Nagog Technology Park Acton, MA 01719 US EMail: quintin.zhao@huawei.com Zhao, et al. Expires November 1, 2017 [Page 23] Internet-Draft Use Cases for PCECC April 2017 Robin Li Huawei Technologies Huawei Bld., No.156 Beiqing Rd. Beijing 100095 China EMail: lizhenbin@huawei.com Boris Khasanov Huawei Technologies Moskovskiy Prospekt 97A St.Petersburg 196084 Russia EMail: khasanov.boris@huawei.com King Ke Tencent Holdings Ltd. Shenzhen China EMail: kinghe@tencent.com Luyuan Fang Microsoft EMail: lufang@microsoft.com Chao Zhou Cisco Systems EMail: chao.zhou@cisco.com Boris Zhang Telus Communications EMail: Boris.zhang@telus.com Artem Rachitskiy Mobile TeleSystems JLLC Nezavisimosti ave., 95 Minsk 220043 Belarus EMail: arachitskiy@mts.by LLC "Lifetech" Krasnoarmeyskaya str., 24 Minsk 220030 Belarus EMail: anton.gulida@life.com.by