Network Working Group Internet Draft Intended status: Informational Maria Napierala Expires: April 15, 2013 AT&T Luyuan Fang Cisco Systems October 15, 2012 Requirements for Extending BGP/MPLS VPNs to End-Systems draft-fang-l3vpn-end-system-requirements-00.txt Abstract Service Providers commonly use BGP/MPLS VPNs [RFC 4364] as the control plane for wide-area virtual networks. This technology has proven to scale to a large number of VPNs and attachment points, and it is well suited to provide VPN service to end-systems. Virtualized environment imposes additional requirements to MPLS/BGP VPN technology when applied to end-system networking, which are defined in this document. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. Napierala, Fang Expire April 2012 [Page 1] Internet Draft October 2012 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction 3 1.1. Terminology 3 2. Application of MPLS/BGP VPNs to End-Systems 3 3. Connectivity Requirements 4 4. Multi-Tenancy Requirements 5 5. Decoupling of Virtualized Networking from Physical Infrastructure 5 6. Decoupling of Layer 3 Virtualization from Layer 2 Topology 6 7. Encapsulation of Virtual Payloads 6 8. Optimal Forwarding of Traffic 7 9. Inter-operability with Existing MPLS/BGP VPNs 8 10. IP Mobility 9 11. BGP Requirements in a Virtualized Environment 10 11.1. BGP Convergence and Routing Consistency 10 11.2. Optimizing Route Distribution 11 12. Security Considerations 11 13. IANA Considerations 11 14. Normative References 11 15. Informative References 11 16. Authors' Addresses 11 17. Acknowledgements 12 Requirements Language Although this document is not a protocol specification, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119]. [Page 2] Internet Draft October 2012 1. Introduction Networks are increasingly being consolidated and outsourced in an effort, both, to improve the deployment time of services as well as reduce operational costs. This coincides with an increasing demand for compute, storage, and network resources from applications. In order to scale compute, storage, and network service functions, physical resources are being abstracted from their logical representation. This is referred as server, storage, and network virtualization. Virtualization can be implemented in various layers of computer systems or networks. The virtualized loads are executed over a common physical infrastructure. Compute nodes running guest operating systems are often executed as Virtual Machines (or VMs). This document defines requirements for a network virtualization solution that provides IP connectivity to virtual resources on end- systems. The requirements address the virtual resources, defined as Virtual Machines, applications, and appliances that require only IP connectivity. Non-IP communication is addressed by other solutions and is not in scope of this document. 1.1. Terminology AS Autonomous Systems End-System A device where Guest OS and Host OS/Hypervisor reside IaaS Infrastructure as a Service RT Route Target ToR Top-of-Rack switch VM Virtual Machine Hypervisor Virtual Machine Manager SDN Software Defined Network VPN Virtual Private Network 2. Application of MPLS/BGP VPNs to End-Systems MPLS/BGP VPN technology [RFC 4364] have proven to be able to scale to a large number of VPNs (tens of thousands) and customer routes (millions) while providing for aggregated management capability. In traditional WAN deployments of BGP IP VPNs a Customer Edge (CE) is a physical device connected to a Provider Edge (PE). In addition, the forwarding function and control function of a Provider Edge (PE) device co-exist within a single physical router. MPLS/BGP VPN technology should to able to evolve and adapt to new virtualized environments by extending VPN service to end-systems. [Page 3] Internet Draft October 2012 When end-system attaches to MPLS/BGP VPN, CE becomes a Virtual Machine or an application residing on the end-system itself. As in traditional MPLS/BGP VPN deployments, it is undesirable for the end- system VPN forwarding knowledge to extend to the transport network infrastructure. Hence, optimally, with regard to forwarding the end- system should become both the CE and the PE simultaneously. Moreover, it is a current practice to implement PE forwarding and control functions in different processors of the same device and to use internal (proprietary) communication between those processors. Typically, the PE control functionality is implemented in one (or very few) components of a device and the PE forwarding functionality is implemented in multiple components of the same device (a.k.a., "line cards"). In end-system environment, a single end-system, effectively, corresponds to a line card in a traditional PE router. For scalable and cost effective deployment of end-system MPLS/BGP VPNs PE forwarding function should be decoupled from PE control function such that the former can be implemented on multiple standalone devices. This separation of functionality will allow for implementing the end-system PE forwarding on multiple end-system devices, for example, in operating systems of application servers or network appliances. The PE control plane function can itself be virtualized and run as an application in end-system. 3. Connectivity Requirements A network virtualization solution should be able to provide IPv4 and IPv6 unicast connectivity between hosts in the same and different subnets without any assumptions regarding the underlying media layer. Furthermore, the multicast transmission, i.e., allowing IP applications to send packets to a group of IPv4 or IPv6 addresses should be supported. The multicast service should also support a delivery of traffic to all endpoints of a given VPN even if those endpoints have not sent any control messages indicating the need to receive that traffic. In other words, the multicast service should be capable of delivering the IP broadcast traffic in a virtual topology. A solution for supporting VPN multicast and VPN broadcast must not require that the underlying transport network supports IP multicast transmission service. In some deployments, Virtual Machines or applications are configured to belong to an IP subnet. A network virtualization solution should support grouping of virtual resources into IP subnets regardless of whether the underlying implementation uses a multi-access network or not. [Page 4] Internet Draft October 2012 4. Multi-Tenancy Requirements One of the main goals of network virtualization is to provide traffic and routing isolation between different virtual components that share a common physical infrastructure. A collection of virtual resources might provide external or internal services. For example, such collection may serve an external "customer" or internal "tenant" to whom a Service Provider provides service(s). We will refer to collection of virtual resources dedicated to a process or application as a VPN, using the terminology of IP VPNs. Any network virtualization solution has to assure the network isolation (in data plane and control plane) among tenants or applications sharing the same data center physical resources. Typically VPNs that belong to different external tenants do not communicate with each other directly but they should be allowed to access shared services or shared network resources. It is also common for tenants to require multiple distinct VPNs. In that scenario traffic might need to cross VPN boundaries, subject to access controls and/or routing policies. A tenant should be able to create multiple VPNs. A network virtualization solution should allow a VM or application end-point to directly access multiple VPNs without a need to traverse a gateway. It is often the case that SP infrastructure services are provided to multiple tenants, for example voice-over-IP gateway services or video-conferencing services for branch offices. A network virtualization solution should support both, isolated VPNs and overlapping VPNs (often referred to as "extranets"), as well as both, any-to-any and hub-and-spoke topologies. 5. Decoupling of Virtualized Networking from Physical Infrastructure One of the main goals in designing a large scale transport network is to minimize the cost and complexity of its "fabric". It is often done by delegating the virtual resource communication processing to the network edge. Networks use various VPN technologies to isolate disjoint groups of virtual resources. Some use VLANs as a VPN technology, others use layer 3 based solutions, often with proprietary control planes. Service Providers are interested in interoperability and in openly documented protocols rather than in proprietary solutions. The transport network infrastructure should not maintain any information that pertains to the virtual resources in end-systems. Decoupling of virtualized networking from the physical infrastructure has the following advantages: 1) provides better [Page 5] Internet Draft October 2012 scalability; 2) simplifies the design and operation; 3) reduces network cost. It has been proven (in Internet and in large BGP IP VPN deployments) that moving complexity to network edge while keeping network core simple has very good scaling properties. There should be a total separation between the virtualized segments (i.e., interfaces associated with virtual resources) and the physical network (i.e., physical interfaces associated with network infrastructure). This separation should include the separation of the virtual network IP address space from the physical network IP address space. The physical infrastructure addresses should be routable in the underlying transport network, while the virtual network addresses should be routable only in the virtual network. Not only should the virtual network data plane be fully decoupled from the physical network, but its control plane should be decoupled as well. 6. Decoupling of Layer 3 Virtualization from Layer 2 Topology The layer 3 approach to network virtualization dictates that the virtualized communication should be routed, not bridged. The layer 3 virtualization solution should be decoupled from the layer 2 topology. Thus, there should be no dependency on VLANs and layer 2 broadcast. In solutions that depend on layer 2 broadcast domains, host-to-host communication is established based on flooding and data plane MAC learning. Layer 2 MAC information has to be maintained on every switch where a given VLAN is present. Even if some solutions are able to minimize data plane MAC learning and/or unicast flooding, they still rely on MAC learning at the network edge and on maintaining the MAC addresses on every (edge) switch where the layer 2 VPN is present. The MAC addresses known to guest OS in end-system are not relevant to IP services and introduce unnecessary overhead. Hence, the MAC addresses associated with virtual resources should not be used in the virtual layer 3 networks. Rather, only what is significant to IP communication, namely the IP addresses of the virtual machines and application endpoints should be maintained by the virtual networks. 7. Encapsulation of Virtual Payloads In a layer 3 end-system virtual network, IP packets should reach the first-hop router in one IP-hop, regardless of whether the first-hop router is an end-system itself (i.e., a hypervisor/Host [Page 6] Internet Draft October 2012 OS) or it is an external (to end-system) device. The first-hop router should always perform an IP lookup on every packet it receives from a virtual machine or an application. The first-hop router should encapsulate the packets and route them towards the destination end-system. In order to scale the transport networks, the virtual network payloads must be encapsulated with headers that are routable (or switchable) in the physical network infrastructure. The IP addresses of the virtual resources are not to be advertized within the physical infrastructure address space. The encapsulation (and decapsulation) function should be implemented on a device as close to virtualized resources as possible. Since the hypervisors in the end-systems are the devices at the network edge they are the most optimal location for the encap/decap functionality. A device implementing the encap/decap functionality acts as the first-hop router in the virtual topology. The network virtualization solution should also support deployments where it is not possible or not desirable to implement the virtual payload encapsulation in the hypervisor/Host OS. In such deployments encap/decap functionality may be implemented in an external device. The external device implementing encap/decap functionality should be a close as possible to the end-system itself. The same network virtualization solution should support deployments with both, internal (in a hypervisor) and external (outside of a hypervisor) encap/decap devices. Whenever the virtual forwarding functionality is implemented in an external device, the virtual service itself must be delivered to an end-system such that switching elements connecting the end-system to the encap/decap device are not aware of the virtual topology. MPLS/VPN technology based on [RFC 4364] specifies that different encapsulation methods could be for connecting PE routers, namely Label Switched Paths (LSPs), IP tunneling, and GRE tunneling. If LSPs are used in the transport network they could be signaled with LDP, in which case host (/32) routes to all PE routers must be propagated throughout the network, or with RSVP-TE, in which case a full mesh of RSVP-TE tunnels is required. If the transport network is only IP-capable then MPLS in IP or MPLS in GRE [RFC4023] encapsulation could be used. Other transport layers such 802.1ah might also need to be supported. 8. Optimal Forwarding of Traffic [Page 7] Internet Draft October 2012 The network virtualization solutions that optimize for the maximum utilization of compute and storage resources require that those resources may be located anywhere in the network. The physical and logical spreading of appliances and workloads implies a very significant increase in the infrastructure bandwidth consumption. Hence, it is important that the virtualized networking solutions are efficient in terms of traffic forwarding and assure that packets traverse the transport network only once. It must be also possible to send the traffic directly from one end- system to another end-system without traversing through a midpoint router. 9. Inter-operability with Existing MPLS/BGP VPNs Service Providers want to tie their server-based offerings to their MPLS/BGP VPN services. MPLS/BGP VPNs provide secure and latency- optimized WAN connectivity to the virtualized resources in SP's data center. MPLS/BGP VPN customers may require simultaneous access to resources in both SP and their own data centers. The service provider-based VPN access can provide additional value compared with public internet access, such as security, QoS, OAM, multicast service, VoIP service, video conferencing, wireless connectivity. Service Providers want to "spin up" the L3VPN access to data center VPNs as dynamically as the spin up of compute and other virtualized resources. The network virtualization solution should be fully inter-operable with MPLS/BGP VPNs, including Inter-AS MPLS/BGP VPN Options A, B, or C [RFC 4364]. MPLS/BGP VPN technology is widely supported on routers and other appliances. BGP/MPLS VPN-capable network devices should be able to participate directly in a virtual network that spans end-systems. The network devices should be able to participate in isolated collections of end-systems, i.e., in isolated VPNs, as well as in overlapping VPNs (called "extranets" in BGP/MPLS VPN terminology). When connecting an end-system VPN with other services/networks, it should not be necessary to advertize the specific host routes but rather the aggregated routing information. A BGP/MPLS VPN-capable router or appliance can be used to aggregate VPN's IP routing information and advertize the aggregated prefixes. The aggregated prefixes should be advertized with the router/appliance IP address as BGP next-hop and with locally assigned aggregate 20-bit label. The aggregate label should trigger a destination IP lookup in its corresponding VRF on all the packets entering the virtual network. [Page 8] Internet Draft October 2012 The inter-connection of end-system VPNs with traditional VPNs requires an integrated control plane and unified orchestration of network and end-system resources. 10. IP Mobility Another reason for a network virtualization is the need to support IP mobility. IP mobility consists in IP addresses used for communication within or between applications being anywhere across the network. Using a virtual topology, i.e., abstracting the externally visible network address from the underlying infrastructure address is an effective way to solve IP mobility problem. IP mobility consists in a device physically moving (e.g., a roaming wireless device) or a workload being transferred from one physical server/appliance to another. IP mobility requires preserving device's active network connections (e.g., TCP and higher-level sessions). Such mobility is also referred to as "live" migration with respect to a Virtual Machine. IP mobility is highly desirable for many reasons such as efficient and flexible resource sharing, data center migration, disaster recovery, server redundancy, or service bursting. To accommodate live mobility of a virtual machine (or a device), it is desirable to assign to it a permanent IP address that remains with the VM/device after it moves. When dealing with IP-only applications it is not only sufficient but optimal to forward the traffic based on layer 3 rather than on layer 2 information. The MAC addresses of devices or applications should be irrelevant to IP services and introduce unnecessary overhead and complications when devices or VMs move (i.e., when a VM moves between physical servers, the MAC learning tables in the switches must be updated; also, it is possible that VM's MAC address might need to change in its new location). In IP-based network virtualization solution a device or a workload move should be handled by an IP route advertisement. IP mobility has to be transparent to applications and any external entity interacting with the applications. This implies that the network connectivity restoration time is critical. The transport sessions can typically survive over several seconds of disruption, however, applications may have sub-second latency requirement for their correct operation. To minimize the disruption to established communication during workload or device mobility, the control plane of a network virtualization solution should be able to differentiate between the [Page 9] Internet Draft October 2012 activation of a workload in a new location from advertising its route to the network. This will enable the remote end-points to update their routing tables prior to workload's migration as well as allowing the traffic to be tunneled via the workload's old location. 11. BGP Requirements in a Virtualized Environment 11.1. BGP Convergence and Routing Consistency BGP was designed to carry very large amount of routing information but it is not a very fast converging protocol. In addition, the routing protocols, including BGP, have traditionally favored convergence (i.e., responsiveness to route change due to failure or policy change) over routing consistency. Routing consistency means that a router forwards a packet strictly along the path adopted by the upstream routers. When responsiveness is favored, a router applies a received update immediately to its forwarding table before propagating the update to other routers, including those that potentially depend upon the outcome of the update. The route change responsiveness comes at the cost of routing blackholes and loops. Routing consistency in virtualized environments is important because multiple workloads can be simultaneously moved between different physical servers due to maintenance activities, for example. If packets sent by the applications that are being moved are dropped (because they do not follow a live path), the active network connections will be dropped. To minimize the disruption to the established communications during VM migration or device mobility, the live path continuity is required. 11.1.1. BGP IP Mobility Requirements In IP mobility, the network connectivity restoration time is critical. In fact, Service Provider networks already use routing and forwarding plane techniques that support fast failure restoration by pre-installing a backup path to a given destination. These techniques allow to forward traffic almost continuously using an indirect forwarding path or a tunnel to a given destination, and hence, are referred to as "local repair". The traffic path is restored locally at the destination's old location while the network converges to a backup path. Eventually, the network converges to an optimal path and bypasses the local repair. BGP assists in the local repair techniques by advertizing multiple and not only the best path to a given destination. [Page 10] Internet Draft October 2012 11.2. Optimizing Route Distribution When virtual networks are triggered based on the IP communication, the Route Target Constraint extension [RFC 4684] of BGP should be used to optimize the route distribution for sparse virtual network events. This technique ensures that only those VPN forwarders that have local participants in a particular data plane event receive its routing information. This also decreases the total load on the upstream BGP speakers. 12. Security Considerations The document presents the requirements for end-systems MPLS/BGP VPNs. The security considerations for specific solutions will be documented in the relevant documents. 13. IANA Considerations This document contains no new IANA considerations. 14. Normative References [RFC 4363] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. [RFC 4023] Worster, T., Rekhter, Y. and E. Rosen, "Encapsulating in IP or Generic Routing Encapsulation (GRE)", RFC 4023, March 2005. [RFC 4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, R., Patel, K. and J. Guichard, "Constrained Route Distribution for Border Gateway Protocol/Multiprotocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, November 2006. 15. Informative References [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 16. Authors' Addresses [Page 11] Internet Draft October 2012 Maria Napierala AT&T 200 Laurel Avenue Middletown, NJ 07748 Email: mnapierala@att.com Luyuan Fang Cisco Systems 111 Wood Avenue South Iselin, NJ 08830, USA Email: lufang@cisco.com 17. Acknowledgements The authors would like to thank Pedro Marques for his comments and input. [Page 12]