BESS Weiguo Hao Lucy Yong S. Hares Internet Draft Huawei Osama Zia Microsoft Intended status: Standard Track May 20, 2015 Expires: November 2015 Inter-AS Option C between NVO3 and BGP/MPLS IP VPN network draft-hao-bess-inter-nvo3-vpn-optionc-01.txt Abstract This draft describes inter-as option-C solution between NVO3 network and MPLS/IP VPN network. BGP label routing information is extended to create multi-hop forwarding path between local NVE and remote PE. Also to ensure VPNv4 route exchange correctly between local NVE and remote PE, VN ID space should be partitioned, only the VN IDs of lower 1 Million can be used for interconnection with outer MPLS VPN network using option-C solution, the rest 15 Million VN IDs can only be used for intra DC. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Hao & et,al Expires November 20, 2015 [Page 1] Internet-Draft Inter-As Option-C May 2015 Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction ................................................ 2 2. Conventions used in this document............................ 4 3. Reference model ............................................. 5 4. Traditional Option-C [RFC4364] Recap......................... 6 5. Inter-As Option-C Solution................................... 6 5.1. Multi-hop EBGP Connection............................... 6 5.2. VPN routes exchange..................................... 7 5.3. Data forwarding process................................. 7 5.3.1. Data flow from TS1 to CE1.......................... 7 5.3.2. Data flow from CE1 to TS1.......................... 8 6. BGP Label Routing Extension.................................. 9 7. NVE-NVA architecture........................................ 10 7.1. Multi-Hop EBGP connection.............................. 11 DC to WAN direction:........................................ 11 WAN to DC direction:........................................ 11 7.2. VPN route exchange..................................... 11 8. Security Considerations..................................... 12 9. IANA Considerations ........................................ 12 10. References ................................................ 12 10.1. Normative References.................................. 12 10.2. Informative References................................ 13 11. Acknowledgments ........................................... 13 1. Introduction In cloud computing era, multi-tenancy has become a core requirement for data centers. Since NVO3 can satisfy multi-tenancy key requirements, this technology is being deployed in an increasing number of cloud data center network. NVO3 focuses on the construction of overlay networks that operate over an IP (L3) underlay transport network. It can provide layer 2 bridging and layer 3 IP service for each tenant. VXLAN and NVGRE are two typical NVO3 technologies. NVO3 overlay network can be controlled through Hao & et,al Expires November 20, 2015 [Page 2] Internet-Draft Inter-As Option-C May 2015 centralized NVE-NVA architecture or through distributed BGP VPN protocol. NVO3 has good scaling properties from relatively small networks to networks with several million tenant systems (TSs) and hundreds of thousands of virtual networks within a single administrative domain. In NVO3 network, 24-bit VN ID is used to identify different virtual networks, theoretically 16M virtual networks can be supported in a data center. In a data center network, each tenant may include one or more layer 2 virtual network and in normal cases each tenant corresponds to one routing domain (RD). Normally each layer 2 virtual network corresponds to one or more subnets. To provide cloud service to external data center client, data center networks should be connected with WAN networks. BGP MPLS/IP VPN has already been widely deployed at WAN networks. Normally internal data center and external MPLS/IP VPN network are different Autonomous System (AS). This requires the setting up of inter-as connections at Autonomous System Border Routers(ASBRs) between NVO3 network and external MPLS/IP network. In multiple NVO3 data center inter-connecting scenario, the traffic across data center normally are carried over BGP MPLS/IP VPN network. This also requires an applicable inter-as solution between NVO3 network and external MPLS/IP network. Similar to the Inter-as connection method defined in RFC4364, there are three different ways of handling this case, they are option-A, option-B and option-C respectively in order of increasing scalability. Option-A is a back-to-back VRFs solution. Using option-A, EBGP session per VPN is created on peering ASBRs. In the data-plane, VLANs are used for tenant traffic separation. It has the lowest scalability among the three solutions. Compared to option-A solution, option-B solution has more scalability. But using option-B, ASBRs need to maintain and distribute all VPN prefixes. In the data plane, ASBRs need to perform MPLS VPN Label switching. Because MPLS VPN Label switching table space on ASBRs is limited, it still has scalability limitation for large VPN network. Option-C solution is a most scalable option through separating VPNv4 and PE prefixes exchange, the ASBRs don't need to maintain and distribute the customers VPN prefixes. The ASBR is only used to exchange the service provider(SP) internal IP. This draft is to propose inter-as option-C solution between NVO3 network and external BGP MPLS/IP VPN network. Compared to the Hao & et,al Expires November 20, 2015 [Page 3] Internet-Draft Inter-As Option-C May 2015 traditional option-C solution defined in [RFC4364], it is for heterogeneous network interconnection, the control plane and data plane procedures in NVO3 network should be newly specified. 2. Conventions used in this document Network Virtualization Edge (NVE) - An NVE is the network entity that sits at the edge of an underlay network and implements network virtualization functions. Tenant System - A physical or virtual system that can play the role of a host, or a forwarding element such as a router, switch, firewall, etc. It belongs to a single tenant and connects to one or more VNs of that tenant. VN - A VN is a logical abstraction of a physical network that provides L2 network services to a set of Tenant Systems. RD - Route Distinguisher. RDs are used to maintain uniqueness among identical routes in different VRFs, The route distinguisher is an 8- octet field prefixed to the customer's IP address. The resulting 12- octet field is a unique "VPN-IPv4" address. RT - Route targets. It is used to control the import and export of routes between different VRFs. Hao & et,al Expires November 20, 2015 [Page 4] Internet-Draft Inter-As Option-C May 2015 3. Reference model +---------------------------------------------------+ | +----+ AS1 | | | TS1| - | | +----+ - | | - +----+ +----+ | | - |NVE1| -- |TOR1|---------------+ | | +----+ - +----+ +----+ | | | | TS2|- | | | +----+ | | | +-------+ | | +------------ | ASBR-d|-|--| | +----+ | +-------+ | | | | TS3| - | | | | +----+ - | | | | - +----+ +----+ | | | - |NVE2| -- |TOR2| | | | +----+ - +----+ +----+ | | | | TS4|- | | | +----+ | | ----------------------------------------------------| | | |---------------------------------------------------| | | AS2 | | | +----+ | | | | CE1| - | | | +----+ - | | | - +----+ +-------+ | | | - | PE1| --------------------| ASBR-w|-|--| | +----+ - +----+ +-------+ | | | CE2|- | | +----+ | |---------------------------------------------------| Figure 1 Reference model Figure 1 shows an arbitrary Multi-AS VPN interconnectivity scenario between NVO3 network and BGP MPLS/IP VPN network. NVE1, NVE2, and ASBR-d forms NVO3 overlay network in internal DC. TS1 and TS2 connect to NVE1, TS3 and TS4 connect to NVE2. PE1 and ASBR-w forms MPLS IP/VPN network in external DC. CE1 and CE2 connect to PE1. The NVO3 network is in AS 1, the MPLS/IP VPN network is in AS 2. There are two tenants in NVO3 network, TSs in tenant 1 can freely communicate with CEs in VPN-Red, TSs in tenant 2 can freely communicate with CEs in VPN-Green. TS1 and TS3 belong to tenant 1, TS2 and TS4 belong to tenant 2. CE1 belongs to VPN-Red , CE2 belongs Hao & et,al Expires November 20, 2015 [Page 5] Internet-Draft Inter-As Option-C May 2015 to VPN-Green. VN ID 10 and VN ID 20 are used to identify tenant1 and tenant2 respectively. PE1 assigned MPLS VPN Label 1000 and 2000 for the routes from CE1 and CE2 respectively. 4. Traditional Option-C [RFC4364] Recap Using traditional option-C defined in [RFC4364], PE routers in different ASes should first establish multi-hop EBGP connections to each other, and then exchange VPN-IPv4 routes over those connections. EBGP is used to distribute labeled IPv4/32 routes to create a label switched path from the ingress PE router to the egress PE router. In this procedure, VPN-IPv4 routes are neither maintained nor distributed by the ASBRs. An ASBR only need maintain labeled IPv4/32 routes to the PE routers within its AS. If the /32 routes for the PE routers are NOT made known to the P routers(other than the ASBRs), then a packet's ingress PE need to put a three-label stack on it. The bottom label is assigned by the egress PE, corresponding to the packet's destination address in a particular VRF. The middle label is assigned by the ASBR, corresponding to the /32 route to the egress PE. The top label is assigned by the ingress PE's IGP Next Hop, corresponding to the /32 route to the ASBR. 5. Inter-As Option-C Solution Each NVE operates as default layer 3 gateway for local connecting TS(s). VRFs are created on each NVE to isolate IP forwarding process between different tenants. At least a VN ID is used as identification for each tenant. Similar to traditional Option-c defined in [RFC4364], multi-hop EBGP connections should be first established between NVEs and PEs, then VPN-IPv4 routes can be exchanged between those connections. EBGP is used to distribute labeled IPv4/32 routes to create a forwarding path from NVE to PE. Unlike traditional option-c BGP label switched path, the forwarding path has two segments, one segment from NVE to ASBR-d in NVO3 network is NVO3 tunnel, another segment from ASBR-d to PE in WAN network is traditional BGP LSP, the two segments should be stitched together, the stitching point is at ASBR-d. The behavior on ASBR-w and PEs in MPLS VPN network has no difference with the behavior of ASBR and PEs in traditional RFC4364 based MPLS VPN Option-C network. 5.1. Multi-hop EBGP Connection In WAN to DC direction, when ASBR-d receives labeled IPv4/32 routes from ASBR-w, it changes BGP Next Hop to itself, allocates new IP Hao & et,al Expires November 20, 2015 [Page 6] Internet-Draft Inter-As Option-C May 2015 address as NVO3 tunnel destination IP per Label, and then advertises the label route to all local NVEs using BGP extension. The new allocated IP address is called NVO3 tunnel IP and it is used to identify a remote PE. NVO3 tunnel IP address pool should be configured in beforehand on ASBR-d. The new allocated NVO3 tunnel IP and MPLS Label correspondence forms outgoing forwarding table which is used to stitch NVO3 tunnel and BGP LSP from internal DC to external DC. In DC to WAN direction, ASBR-d announces labeled IPv4/32 routes to ASBR-w for each NVE, MPLS Label is assigned for each NVE. The allocated MPLS Label and NVE IP correspondence forms incoming forwarding table which is used to stitch BGP LSP and NVO3 tunnel from external DC to internal DC. 5.2. VPN routes exchange Then VPN-IPv4 routes can be exchanged between the NVE and remote PE using RFC4364. Route distinguishers (RD) and RT are specified for each VRF on each NVE and PE. Each NVE advertises all local VPN route to remote PEs using tenant identification VN ID as MPLS VPN Label. These remote PEs deal with the NVE as regular PE, they match RT and populates these VPN route to local VRF. For the traffic from remote CE to local TS, ingress PE uses the VN ID as bottom label in MPLS encapsulation. Because VN ID field is 24 bits, to ensure these NVEs and PEs interworking, VN ID length should not beyond 20 bits, i.e., VN ID value must not be larger than 1 Million. In NVO3 network, VN ID space should be partitioned, only the VN IDs of lower 1 Million can be used for interconnection with outer MPLS VPN network, the rest 15 Million VN IDs can only be used for intra DC. Each MPLS VPN PE also advertises all local VPN route to peer NVEs, these NVEs match RT and populates these VPN route to local VRF. For the traffic from local TS to remote CE, because ingress NVE doesn't support MPLS encapsulation, it encoded the MPLS VPN Label advertised from remote PE as VN ID in NVO3 encapsulation. 5.3. Data forwarding process This section describes the step by step procedures of data forward between TS1 and CE1 in figure 1. 5.3.1. Data flow from TS1 to CE1 1. TS1 sends a packet to NVE1, destination IP is CE1's IP. Hao & et,al Expires November 20, 2015 [Page 7] Internet-Draft Inter-As Option-C May 2015 2. NVE1 acquires local VRF relying on packet input interface, then looks up the VRF's routing table corresponding to tenant 1, performs NVO3 encapsulation, and then sends the encapsulation packet to ASBR-d. The MPLS VPN Label associated with the packet's destination address is encoded in VN ID field. NVO3 tunnel destination IP is the new IP address allocated on ASBR-d associated with the /32 routes for the PE routers that the remote CE attached. 3. ASBR-d decapsulates the NVO3 encapsulation and then performs MPLS encapsulation. Two Labels should be pushed for the MPLS encapsulation, BGP LSP Label as top Label and MPLS VPN Label as bottom Label. BGP LSP Label is acquired by looking up outgoing stitching table, MPLS VPN Label is copied from VN ID. 4. ASBR-w swaps BGP MPLS Label, then push IGP Label and sends the packet to PE1. MPLS VPN Label remains unchanged. 5. PE1 pops all MPLS Label, finds local VRF relying on bottom MPLS VPN Label, looks up local IP forwarding table in the VRF, and then sends the packet to CE1. 5.3.2. Data flow from CE1 to TS1 1. CE1 sends a packet to PE1, destination IP is TS1's IP. 2. PE1 acquires local VRF relying on packet input interface, then looks up the VRF's routing table. It puts a three-label stack on it. The bottom label is the tenant VN ID corresponds to TS1, the VN ID is 10. The middle label is assigned by the ASBR-w, associating with the /32 route for the egress NVE1. The top label is assigned by the ingress PE's IGP Next Hop, corresponding to the /32 route to ASBR-w. 3. ASBR-w pops top IGP Label, swaps middle BGP Label, and then sends the packet to ASBR-d. 4. ASBR-d decapsulates MPLS encapsulation, performs NVO3 encapsulation and then sends the packet to egress NVE1. The egress NVE's IP address is acquired relying on looking up incoming stitching table, VN ID is copied from the bottom MPLS Label ,i.e., MPLS VPN Label. 5. NVE1 decapsulates NVO3 encapsulation, finds local VRF relying on VN ID, looks up routing table and then sends the packet to TS1. Hao & et,al Expires November 20, 2015 [Page 8] Internet-Draft Inter-As Option-C May 2015 6. BGP Label Routing Extension In RFC3107, BGP is used to carry label mapping information for a particular route. In this draft, multi-hop EBGP connection needs to cross NVO3 and MPLS/IP VPN network. In NVO3 network, BGP label mapping information should be extended to convey NVO3 Tunnel IP address for a particular route. The Label mapping information is carried as part of the Network Layer Reachability Information (NLRI) in the Multiprotocol Extensions attributes. The AFI indicates, as usual, the address family of the associated route, a new SAFI(TBD) should be proposed to indicate the NLRI contains NVO3 tunnel IP address. There are various NVO3 encapsulations, like VXLAN(Virtual eXtensible Local Area Network), NVGRE(Network Virtualization using Generic Routing Encapsulation), GENEVE(Generic Network Virtualization Encapsulation), GUE(Generic UDP Encapsulation), GPE(Generic Protocol Extension for VXLAN), etc. All these encapsulations are IP based and have 24 bit VN ID as virtual network identification. The Network Layer Reachability information is encoded as the form , whose fields are described below: +---------------------------+ | Length (1 octet) | +---------------------------+ | Tunnel Type | +---------------------------+ | NVO3 Tunnel IP | +---------------------------+ | Prefix (variable) | +---------------------------+ The use and the meaning of these fields are as follows: a) Length: The Length field indicates the length in bits of the address prefix plus NVO3 Tunnel IP. b) Tunnel type: The Tunnel type identifies the type of tunneling technology being signaled. This document defines the following types: - VXLAN: Tunnel Type = 0 Hao & et,al Expires November 20, 2015 [Page 9] Internet-Draft Inter-As Option-C May 2015 - NVGRE: Tunnel Type = 1 - GENEVE: Tunnel Type = 2 - GUE: Tunnel Type = 3 - GPE: Tunnel Type = 4 c) NVO3 Tunnel IP: The NVO3 Tunnel IP is encoded as 4 octets in IPv4 case. d) Prefix: The Prefix field contains address prefixes followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant. The NVO3 Tunnel IP must be assigned by the ASBR-d located in NVO3 network for each MPLS Label specified for a particular route (and associated with its address prefix) defined in [RFC3107], the MPLS Label is carried in BGP Label mapping information received from peer ASBR-w. A BGP speaker can withdraw a previously advertised route (as well as the binding between this route and a NVO3 Tunnel IP) by either (a) advertising a new route (and a label) with the same NLRI as the previously advertised route, or (b) listing the NLRI of the previously advertised route in the Withdrawn Routes field of an Update message. The NVO3 Tunnel IP information carried (as part of NLRI) in the Withdrawn Routes field should be set to 0xFFFFFFFF. (Of course, terminating the BGP session also withdraws all the previously advertised routes.) 7. NVE-NVA architecture In this architecture, the NVE control plane and forwarding functionality are decoupled. All NVEs in NVO3 network don't need support BGP protocol, these NVEs have only data plane functionality and are controlled by centralized NVA using openflow, ovsdb, i2rs, etc. The NVA runs BGP Label Routing Extension with ASBR-d for all the NVEs to establish multi-hop EBGP connection. The NVA also runs BGP VPN protocol with peer PE for all the NVEs. ASBR-d allocates new Hao & et,al Expires November 20, 2015 [Page 10] Internet-Draft Inter-As Option-C May 2015 IP address as NVO3 tunnel destination IP per Label, and advertises the label route to NVA using the BGP Label Routing extension. NVA maintains all tenant information, and originates BGP routes with the appropriate RD and RT. The NVA tenant information includes VNID to identify each tenant and the corresponding RD and RT. This information can be statically configured by operators or dynamically allocated. This information also includes all TS's MAC/IP address and its attached NVE information. NVA uses RFC4364 to exchange VPN- IPv4 routes with remote PEs, VNID is used as MPLS VPN Label. 7.1. Multi-Hop EBGP connection DC to WAN direction: 1. ASBR-d allocates BGP MPLS Label per NVE. 2. ASBR-d advertises BGP Label routing information to peer ASBR-w. ASBR-d generates incoming stitching table . WAN to DC direction: 1. ASBR-d receives BGP Label routing information from peer ASBR-w. 2. ASBR-d allocates NVO3 Tunnel IP for each Label received from ASBR-w, the ASBR-d announces the BGP Label Route to NVA. 3. ASBR-d generates outgoing stitching table. 7.2. VPN route exchange NVA advertises all internal data center tenant routing information to remote PEs using RFC 4364, which includes RD, RT, IP prefix, and MPLS VPN Label, the tenant identification of VN ID is used as MPLS VPN Label. Each remote MPLS VPN PE also advertises local VPN routes to NVA. NVA acquires NVO3 Tunnel IP allocated by ASBR-d corresponding to the PE, matches RT attribute and populates the VPN routes to local VRF. Hao & et,al Expires November 20, 2015 [Page 11] Internet-Draft Inter-As Option-C May 2015 Then the NVA downloads corresponding VPN forwarding table including to each NVE. VPN route exchange ------------------------------------------------- | | ------ IBGP -------- EBGP -------- ----- |NVA | -----|ASBR-d |-------- |ASBR-w |--------|PE | ------ -------- -------- ----- . . Southbound interface(Openflow,OVSDB,I2RS,etc) ............ . . . . . . ------ ------ |NVE1| |NVE2| ------ ------ Figure 2 NVE-NVA Architecture 8. Security Considerations Internal IP (Loopback IP for PE/NVE) addresses a network is advertised and visible in another network, which is a security risk. Most operators wants to prevent any external visibility and access into their internal devices IP. option C is suggested to be deployed within a single SP or enterprise with both MPLS and NVO3 networks. 9. IANA Considerations A new SAFI(TBD) is proposed to indicate the NLRI contains NVO3 tunnel IP. 10. References 10.1. Normative References [1] [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] [RFC4364] E. Rosen, Y. Rekhter, " BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. Hao & et,al Expires November 20, 2015 [Page 12] Internet-Draft Inter-As Option-C May 2015 [3] [RFC3107] Y. Rekhter,E. Rosen, ''Carrying Label Information in BGP-4'', RFC 3107, May 2001 10.2. Informative References [1] [NVA] D.Black, etc, "An Architecture for Overlay Networks (NVO3)", draft-ietf-nvo3-arch-01, February 14, 2014 [2] [RFC7047] B. Pfaff, B. Davie,''The Open vSwitch Database Management Protocol'', RFC 7047, December 2013 [3] [OpenFlow1.3]OpenFlow Switch Specification Version 1.3.0 (Wire Protocol 0x04). June 25, 2012. (https://www.opennetworking.org/images/stories/downloads/sdn- resources/onf-specifications/openflow/openflow-spec-v1.3.0.pdf) [4] [GENEVE] J. Gross, etc, "Geneve: Generic Network Virtualization Encapsulation", draft-ietf-nvo3-geneve-00, May 8, 2015. [5] [GUE] T. Herbert, etc, "Generic UDP Encapsulation", draft- ietf-nvo3-gue-00, April 16, 2015. [6] [GPE] P. Quinn, etc, "Generic Protocol Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-00, May 1, 2015. [7] [RFC7348] M. Mahalingam, etc, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC7348, August 2014. [8] [NVGRE] P. Garg, etc, "NVGRE: Network Virtualization using Generic Routing Encapsulation", draft-sridharan- virtualization-nvgre-08, April 13, 2015. 11. Acknowledgments Authors like to thank Shunwan Zhuang, Haibo Wang for their valuable inputs. Hao & et,al Expires November 20, 2015 [Page 13] Internet-Draft Inter-As Option-C May 2015 Authors' Addresses Weiguo Hao Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56623144 Email: haoweiguo@huawei.com Lucy Yong Huawei Technologies Phone: +1-918-808-1918 Email: lucy.yong@huawei.com Susan Hares Huawei Technologies Phone: +1-734-604-0323 Email: shares@ndzh.com. Osama Zia Microsoft Email: osamaz@microsoft.com Hao & et,al Expires November 20, 2015 [Page 14]