Network Working Group                                             
   Internet Draft                                                    
   Intended status: Informational                    Maria Napierala 
   Expires: April 15, 2013                                      AT&T 
                                                         Luyuan Fang 
                                                       Cisco Systems 
                                               
                                                    October 15, 2012 
 
                                      
          Requirements for Extending BGP/MPLS VPNs to End-Systems  
              draft-fang-l3vpn-end-system-requirements-00.txt 
    
Abstract 
 
   Service Providers commonly use BGP/MPLS VPNs [RFC 4364] as the 
   control plane for wide-area virtual networks. This technology has 
   proven to scale to a large number of VPNs and attachment points, 
   and it is well suited to provide VPN service to end-systems. 
   Virtualized environment imposes additional requirements to MPLS/BGP 
   VPN technology when applied to end-system networking, which are 
   defined in this document. 
 
 
Status of this Memo 
 
   This Internet-Draft is submitted to IETF in full conformance with 
   the provisions of BCP 78 and BCP 79.  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts.   
    
   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time. It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/1id-abstracts.html 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
    
    
Copyright and License Notice 
    
   Copyright (c) 2012 IETF Trust and the persons identified as the 
   document authors.  All rights reserved. 
Napierala, Fang           Expire April 2012                  [Page 1] 
    
Internet Draft                                          October 2012 
    
    
   This document is subject to BCP 78 and the IETF Trust's Legal 
   Provisions Relating to IETF Documents 
   (http://trustee.ietf.org/license-info) in effect on the date of 
   publication of this document.  Please review these documents 
   carefully, as they describe your rights and restrictions with 
   respect to this document.  Code Components extracted from this 
   document must include Simplified BSD License text as described in 
   Section 4.e of the Trust Legal Provisions and are provided without 
   warranty as described in the Simplified BSD License. 
    
    
Table of Contents 
         
   1.   Introduction                                                 3 
   1.1.  Terminology                                                 3 
   2.   Application of MPLS/BGP VPNs to End-Systems                  3 
   3.   Connectivity Requirements                                    4 
   4.   Multi-Tenancy Requirements                                   5 
   5.   Decoupling of Virtualized Networking from Physical 
   Infrastructure                                                    5 
   6.   Decoupling of Layer 3 Virtualization from Layer 2 Topology   6 
   7.   Encapsulation of Virtual Payloads                            6 
   8.   Optimal Forwarding of Traffic                                7 
   9.   Inter-operability with Existing MPLS/BGP VPNs                8 
   10.  IP Mobility                                                  9 
   11.  BGP Requirements in a Virtualized Environment               10 
   11.1. BGP Convergence and Routing Consistency                    10 
   11.2. Optimizing Route Distribution                              11 
   12.  Security Considerations                                     11 
   13.  IANA Considerations                                         11 
   14.  Normative References                                        11 
   15.  Informative References                                      11 
   16.  Authors' Addresses                                          11 
   17.  Acknowledgements                                            12 
    
    
Requirements Language 
 
   Although this document is not a protocol specification, the key 
   words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 
   this document are to be interpreted as described in RFC 2119 [RFC 
   2119]. 
 
    

                                                              [Page 2] 
    
    
Internet Draft                                          October 2012 
    
1. Introduction 
 
   Networks are increasingly being consolidated and outsourced in an 
   effort, both, to improve the deployment time of services as well as 
   reduce operational costs. This coincides with an increasing demand 
   for compute, storage, and network resources from applications. 
    
   In order to scale compute, storage, and network service functions, 
   physical resources are being abstracted from their logical 
   representation. This is referred as server, storage, and network 
   virtualization. Virtualization can be implemented in various layers 
   of computer systems or networks. The virtualized loads are executed 
   over a common physical infrastructure. Compute nodes running guest 
   operating systems are often executed as Virtual Machines (or VMs). 
    
   This document defines requirements for a network virtualization 
   solution that provides IP connectivity to virtual resources on end-
   systems. The requirements address the virtual resources, defined as 
   Virtual Machines, applications, and appliances that require only IP 
   connectivity. Non-IP communication is addressed by other solutions 
   and is not in scope of this document. 
 

  1.1.  Terminology  
    
   AS           Autonomous Systems  
   End-System   A device where Guest OS and Host OS/Hypervisor reside 
   IaaS         Infrastructure as a Service 
   RT           Route Target 
   ToR          Top-of-Rack switch 
   VM           Virtual Machine 
   Hypervisor   Virtual Machine Manager 
   SDN          Software Defined Network 
   VPN          Virtual Private Network 
 
    

2. Application of MPLS/BGP VPNs to End-Systems  
 
   MPLS/BGP VPN technology [RFC 4364] have proven to be able to scale 
   to a large number of VPNs (tens of thousands) and customer routes 
   (millions) while providing for aggregated management capability. In 
   traditional WAN deployments of BGP IP VPNs a Customer Edge (CE) is 
   a physical device connected to a Provider Edge (PE). In addition, 
   the forwarding function and control function of a Provider Edge 
   (PE) device co-exist within a single physical router.   
 
   MPLS/BGP VPN technology should to able to evolve and adapt to new 
   virtualized environments by extending VPN service to end-systems. 
                                                              [Page 3] 
    
    
Internet Draft                                          October 2012 
    
   When end-system attaches to MPLS/BGP VPN, CE becomes a Virtual 
   Machine or an application residing on the end-system itself. As in 
   traditional MPLS/BGP VPN deployments, it is undesirable for the end-
   system VPN forwarding knowledge to extend to the transport network 
   infrastructure. Hence, optimally, with regard to forwarding the end-
   system should become both the CE and the PE simultaneously. 
   Moreover, it is a current practice to implement PE forwarding and 
   control functions in different processors of the same device and to 
   use internal (proprietary) communication between those processors. 
   Typically, the PE control functionality is implemented in one (or 
   very few) components of a device and the PE forwarding 
   functionality is implemented in multiple components of the same 
   device (a.k.a., "line cards"). In end-system environment, a single 
   end-system, effectively, corresponds to a line card in a 
   traditional PE router. For scalable and cost effective deployment 
   of end-system MPLS/BGP VPNs PE forwarding function should be 
   decoupled from PE control function such that the former can be 
   implemented on multiple standalone devices. This separation of 
   functionality will allow for implementing the end-system PE 
   forwarding on multiple end-system devices, for example, in 
   operating systems of application servers or network appliances.  
   The PE control plane function can itself be virtualized and run as 
   an application in end-system.  
 

3. Connectivity Requirements 
 
   A network virtualization solution should be able to provide IPv4 and 
   IPv6 unicast connectivity between hosts in the same and different 
   subnets without any assumptions regarding the underlying media 
   layer.  
    
   Furthermore, the multicast transmission, i.e., allowing IP 
   applications to send packets to a group of IPv4 or IPv6 addresses 
   should be supported. The multicast service should also support a 
   delivery of traffic to all endpoints of a given VPN even if those 
   endpoints have not sent any control messages indicating the need to 
   receive that traffic. In other words, the multicast service should 
   be capable of delivering the IP broadcast traffic in a virtual 
   topology. A solution for supporting VPN multicast and VPN broadcast 
   must not require that the underlying transport network supports IP 
   multicast transmission service.   
    
   In some deployments, Virtual Machines or applications are 
   configured to belong to an IP subnet.  A network virtualization 
   solution should support grouping of virtual resources into IP 
   subnets regardless of whether the underlying implementation uses a 
   multi-access network or not.  
    
                                                              [Page 4] 
    
    
Internet Draft                                          October 2012 
    
4. Multi-Tenancy Requirements 
 
   One of the main goals of network virtualization is to provide 
   traffic and routing isolation between different virtual components 
   that share a common physical infrastructure. A collection of 
   virtual resources might provide external or internal services. For 
   example, such collection may serve an external "customer" or 
   internal "tenant" to whom a Service Provider provides service(s).  
   We will refer to collection of virtual resources dedicated to a 
   process or application as a VPN, using the terminology of IP VPNs.  
    
   Any network virtualization solution has to assure the network 
   isolation (in data plane and control plane) among tenants or 
   applications sharing the same data center physical resources. 
   Typically VPNs that belong to different external tenants do not 
   communicate with each other directly but they should be allowed to 
   access shared services or shared network resources. It is also 
   common for tenants to require multiple distinct VPNs. In that 
   scenario traffic might need to cross VPN boundaries, subject to 
   access controls and/or routing policies.  
    
   A tenant should be able to create multiple VPNs. A network 
   virtualization solution should allow a VM or application end-point 
   to directly access multiple VPNs without a need to traverse a 
   gateway. It is often the case that SP infrastructure services are 
   provided to multiple tenants, for example voice-over-IP gateway 
   services or video-conferencing services for branch offices. 
   A network virtualization solution should support both, isolated 
   VPNs and overlapping VPNs (often referred to as "extranets"), as 
   well as both, any-to-any and hub-and-spoke topologies.  
    

5. Decoupling of Virtualized Networking from Physical 
   Infrastructure 
 
   One of the main goals in designing a large scale transport network 
   is to minimize the cost and complexity of its "fabric". It is often 
   done by delegating the virtual resource communication processing to 
   the network edge. Networks use various VPN technologies to isolate 
   disjoint groups of virtual resources. Some use VLANs as a VPN 
   technology, others use layer 3 based solutions, often with 
   proprietary control planes. Service Providers are interested in 
   interoperability and in openly documented protocols rather than in 
   proprietary solutions. 
    
   The transport network infrastructure should not maintain any 
   information that pertains to the virtual resources in end-systems. 
   Decoupling of virtualized networking from the physical 
   infrastructure has the following advantages: 1) provides better 
                                                              [Page 5] 
    
    
Internet Draft                                          October 2012 
    
   scalability; 2) simplifies the design and operation; 3) reduces 
   network cost. It has been proven (in Internet and in large BGP IP 
   VPN deployments) that moving complexity to network edge while 
   keeping network core simple has very good scaling properties. 
    
   There should be a total separation between the virtualized segments 
   (i.e., interfaces associated with virtual resources) and the 
   physical network (i.e., physical interfaces associated with network 
   infrastructure). This separation should include the separation of 
   the virtual network IP address space from the physical network IP 
   address space. The physical infrastructure addresses should be 
   routable in the underlying transport network, while the virtual 
   network addresses should be routable only in the virtual network. 
   Not only should the virtual network data plane be fully decoupled 
   from the physical network, but its control plane should be 
   decoupled as well. 
 

6. Decoupling of Layer 3 Virtualization from Layer 2 Topology 
 
   The layer 3 approach to network virtualization dictates that the 
   virtualized communication should be routed, not bridged. The layer 
   3 virtualization solution should be decoupled from the layer 2 
   topology. Thus, there should be no dependency on VLANs and layer 2 
   broadcast.  
    
   In solutions that depend on layer 2 broadcast domains, host-to-host 
   communication is established based on flooding and data plane MAC 
   learning. Layer 2 MAC information has to be maintained on every 
   switch where a given VLAN is present. Even if some solutions are 
   able to minimize data plane MAC learning and/or unicast flooding, 
   they still rely on MAC learning at the network edge and on 
   maintaining the MAC addresses on every (edge) switch where the 
   layer 2 VPN is present. 
    
   The MAC addresses known to guest OS in end-system are not relevant 
   to IP services and introduce unnecessary overhead. Hence, the MAC 
   addresses associated with virtual resources should not be used in 
   the virtual layer 3 networks. Rather, only what is significant to 
   IP communication, namely the IP addresses of the virtual machines 
   and application endpoints should be maintained by the virtual 
   networks.   
 

7. Encapsulation of Virtual Payloads 
 
   In a layer 3 end-system virtual network, IP packets should reach 
   the first-hop router in one IP-hop, regardless of whether the 
   first-hop router is an end-system itself (i.e., a hypervisor/Host 
                                                              [Page 6] 
    
    
Internet Draft                                          October 2012 
    
   OS) or it is an external (to end-system) device. The first-hop 
   router should always perform an IP lookup on every packet it 
   receives from a virtual machine or an application. The first-hop 
   router should encapsulate the packets and route them towards the 
   destination end-system.  
    
   In order to scale the transport networks, the virtual network 
   payloads must be encapsulated with headers that are routable (or 
   switchable) in the physical network infrastructure. The IP 
   addresses of the virtual resources are not to be advertized within 
   the physical infrastructure address space.  
    
   The encapsulation (and decapsulation) function should be 
   implemented on a device as close to virtualized resources as 
   possible. Since the hypervisors in the end-systems are the devices 
   at the network edge they are the most optimal location for the 
   encap/decap functionality.  A device implementing the encap/decap 
   functionality acts as the first-hop router in the virtual topology. 
    
   The network virtualization solution should also support deployments 
   where it is not possible or not desirable to implement the virtual 
   payload encapsulation in the hypervisor/Host OS. In such 
   deployments encap/decap functionality may be implemented in an 
   external device. The external device implementing encap/decap 
   functionality should be a close as possible to the end-system 
   itself. The same network virtualization solution should support 
   deployments with both, internal (in a hypervisor) and external 
   (outside of a hypervisor) encap/decap devices. 
    
   Whenever the virtual forwarding functionality is implemented in an 
   external device, the virtual service itself must be delivered to an 
   end-system such that switching elements connecting the end-system 
   to the encap/decap device are not aware of the virtual topology. 
    
   MPLS/VPN technology based on [RFC 4364] specifies that different 
   encapsulation methods could be for connecting PE routers, namely 
   Label Switched Paths (LSPs), IP tunneling, and GRE tunneling. If 
   LSPs are used in the transport network they could be signaled with 
   LDP, in which case host (/32) routes to all PE routers must be 
   propagated throughout the network, or with RSVP-TE, in which case a 
   full mesh of RSVP-TE tunnels is required. If the transport network 
   is only IP-capable then MPLS in IP or MPLS in GRE [RFC4023] 
   encapsulation could be used. Other transport layers such 802.1ah 
   might also need to be supported.  
 

8. Optimal Forwarding of Traffic 
    

                                                              [Page 7] 
    
    
Internet Draft                                          October 2012 
    
   The network virtualization solutions that optimize for the maximum 
   utilization of compute and storage resources require that those 
   resources may be located anywhere in the network.  The physical and 
   logical spreading of appliances and workloads implies a very 
   significant increase in the infrastructure bandwidth consumption. 
   Hence, it is important that the virtualized networking solutions are 
   efficient in terms of traffic forwarding and assure that packets 
   traverse the transport network only once.  
       
   It must be also possible to send the traffic directly from one end-
   system to another end-system without traversing through a midpoint 
   router.  
 

9. Inter-operability with Existing MPLS/BGP VPNs 
 
   Service Providers want to tie their server-based offerings to their 
   MPLS/BGP VPN services. MPLS/BGP VPNs provide secure and latency-
   optimized WAN connectivity to the virtualized resources in SP's 
   data center. MPLS/BGP VPN customers may require simultaneous access 
   to resources in both SP and their own data centers. The service 
   provider-based VPN access can provide additional value compared 
   with public internet access, such as security, QoS, OAM, multicast 
   service, VoIP service, video conferencing, wireless connectivity. 
   Service Providers want to "spin up" the L3VPN access to data center 
   VPNs as dynamically as the spin up of compute and other virtualized 
   resources. 
 
   The network virtualization solution should be fully inter-operable 
   with MPLS/BGP VPNs, including Inter-AS MPLS/BGP VPN Options A, B, 
   or C [RFC 4364]. MPLS/BGP VPN technology is widely supported on 
   routers and other appliances. BGP/MPLS VPN-capable network devices 
   should be able to participate directly in a virtual network that 
   spans end-systems. The network devices should be able to 
   participate in isolated collections of end-systems, i.e., in 
   isolated VPNs, as well as in overlapping VPNs (called "extranets" 
   in BGP/MPLS VPN terminology). 
    
   When connecting an end-system VPN with other services/networks, it 
   should not be necessary to advertize the specific host routes but 
   rather the aggregated routing information. A BGP/MPLS VPN-capable 
   router or appliance can be used to aggregate VPN's IP routing 
   information and advertize the aggregated prefixes. The aggregated 
   prefixes should be advertized with the router/appliance IP address 
   as BGP next-hop and with locally assigned aggregate 20-bit label. 
   The aggregate label should trigger a destination IP lookup in its 
   corresponding VRF on all the packets entering the virtual network. 
 

                                                              [Page 8] 
    
    
Internet Draft                                          October 2012 
    
   The inter-connection of end-system VPNs with traditional VPNs 
   requires an integrated control plane and unified orchestration of 
   network and end-system resources. 
    

10.  IP Mobility 
 
   Another reason for a network virtualization is the need to support 
   IP mobility. IP mobility consists in IP addresses used for 
   communication within or between applications being anywhere across 
   the network. Using a virtual topology, i.e., abstracting the 
   externally visible network address from the underlying 
   infrastructure address is an effective way to solve IP mobility 
   problem. 
 
   IP mobility consists in a device physically moving (e.g., a roaming 
   wireless device) or a workload being transferred from one physical 
   server/appliance to another. IP mobility requires preserving 
   device's active network connections (e.g., TCP and higher-level 
   sessions). Such mobility is also referred to as "live" migration 
   with respect to a Virtual Machine. IP mobility is highly desirable 
   for many reasons such as efficient and flexible resource sharing, 
   data center migration, disaster recovery, server redundancy, or 
   service bursting.  
 
   To accommodate live mobility of a virtual machine (or a device), it 
   is desirable to assign to it a permanent IP address that remains 
   with the VM/device after it moves. When dealing with IP-only 
   applications it is not only sufficient but optimal to forward the 
   traffic based on layer 3 rather than on layer 2 information. The 
   MAC addresses of devices or applications should be irrelevant to IP 
   services and introduce unnecessary overhead and complications when 
   devices or VMs move (i.e., when a VM moves between physical 
   servers, the MAC learning tables in the switches must be updated; 
   also, it is possible that VM's MAC address might need to change in 
   its new location). In IP-based network virtualization solution a 
   device or a workload move should be handled by an IP route 
   advertisement. 
 
   IP mobility has to be transparent to applications and any external 
   entity interacting with the applications. This implies that the 
   network connectivity restoration time is critical. The transport 
   sessions can typically survive over several seconds of disruption, 
   however, applications may have sub-second latency requirement for 
   their correct operation.  
 
   To minimize the disruption to established communication during 
   workload or device mobility, the control plane of a network 
   virtualization solution should be able to differentiate between the 
                                                              [Page 9] 
    
    
Internet Draft                                          October 2012 
    
   activation of a workload in a new location from advertising its 
   route to the network. This will enable the remote end-points to 
   update their routing tables prior to workload's migration as well 
   as allowing the traffic to be tunneled via the workload's old 
   location.  
 

11.     BGP Requirements in a Virtualized Environment 

  11.1. BGP Convergence and Routing Consistency 
 
   BGP was designed to carry very large amount of routing information 
   but it is not a very fast converging protocol. In addition, the 
   routing protocols, including BGP, have traditionally favored 
   convergence (i.e., responsiveness to route change due to failure or 
   policy change) over routing consistency. Routing consistency means 
   that a router forwards a packet strictly along the path adopted by 
   the upstream routers. When responsiveness is favored, a router 
   applies a received update immediately to its forwarding table 
   before propagating the update to other routers, including those 
   that potentially depend upon the outcome of the update. The route 
   change responsiveness comes at the cost of routing blackholes and 
   loops.  
 
   Routing consistency in virtualized environments is important 
   because multiple workloads can be simultaneously moved between 
   different physical servers due to maintenance activities, for 
   example. If packets sent by the applications that are being moved 
   are dropped (because they do not follow a live path), the active 
   network connections will be dropped. To minimize the disruption to 
   the established communications during VM migration or device 
   mobility, the live path continuity is required.   
 

11.1.1. BGP IP Mobility Requirements 
 
   In IP mobility, the network connectivity restoration time is 
   critical.  In fact, Service Provider networks already use routing 
   and forwarding plane techniques that support fast failure 
   restoration by pre-installing a backup path to a given destination. 
   These techniques allow to forward traffic almost continuously using 
   an indirect forwarding path or a tunnel to a given destination, and 
   hence, are referred to as "local repair". The traffic path is 
   restored locally at the destination's old location while the 
   network converges to a backup path. Eventually, the network 
   converges to an optimal path and bypasses the local repair. 
   BGP assists in the local repair techniques by advertizing multiple 
   and not only the best path to a given destination.  
                                                             [Page 10] 
    
    
Internet Draft                                          October 2012 
    
 

  11.2. Optimizing Route Distribution 
 
   When virtual networks are triggered based on the IP communication, 
   the Route Target Constraint extension [RFC 4684] of BGP should be 
   used to optimize the route distribution for sparse virtual network 
   events. This technique ensures that only those VPN forwarders that 
   have local participants in a particular data plane event receive 
   its routing information. This also decreases the total load on the 
   upstream BGP speakers. 
 

12.     Security Considerations 
 
   The document presents the requirements for end-systems MPLS/BGP 
   VPNs. The security considerations for specific solutions will be 
   documented in the relevant documents. 
    
    
13.     IANA Considerations 
    
   This document contains no new IANA considerations. 
    
    
14.     Normative References 
    
   [RFC 4363] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 
   Networks (VPNs)", RFC 4364, February 2006.   
    
   [RFC 4023]  Worster, T., Rekhter, Y. and E. Rosen, "Encapsulating 
   in IP or Generic Routing Encapsulation (GRE)", RFC 4023, March  
   2005. 
    
   [RFC 4684]  Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, 
   R., Patel, K. and J. Guichard, "Constrained Route Distribution for 
   Border Gateway Protocol/Multiprotocol Label Switching (BGP/MPLS) 
   Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, 
   November 2006. 
 
    
15.     Informative References 
    
   [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate                    
   Requirement Levels", BCP 14, RFC 2119, March 1997. 
    
    
16.     Authors' Addresses 
    
                                                             [Page 11] 
    
    
Internet Draft                                          October 2012 
    
     
   Maria Napierala 
   AT&T 
   200 Laurel Avenue 
   Middletown, NJ 07748 
   Email: mnapierala@att.com 
    
   Luyuan Fang 
   Cisco Systems 
   111 Wood Avenue South 
   Iselin, NJ 08830, USA 
   Email: lufang@cisco.com 
    
    
17.     Acknowledgements  
    
    
   The authors would like to thank Pedro Marques for his comments and 
   input. 































                                                             [Page 12]