Network working group X. Xu Internet Draft Huawei Technologies Category: Standard Track Expires: October 2011 April 11, 2011 Virtual Subnet: A Scalable Data Center Interconnection Solution draft-xu-virtual-subnet-05 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 11, 2011. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Xu Expires October 11, 2011 [Page 1] Internet-Draft Virtual Subnet April 2011 Abstract This document proposes a scalable IP-only L2VPN solution called Virtual Subnet, which reuses BGP/MPLS IP VPN [RFC4364] and ARP proxy [RFC925][RFC1027] technologies. Virtual Subnet could be used to interconnect geographically distributed data centers in a much scalable way. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. Table of Contents 1. Introduction.................................................3 2. Terminology..................................................3 3. Solution Description.........................................3 3.1. Unicast.................................................3 3.1.1. Intra-subnet Unicast...............................3 3.1.2. Inter-subnet Unicast...............................4 3.2. Multicast/Broadcast.....................................6 3.3. CE Host Discovery.......................................6 3.4. CE Multi-homing.........................................7 3.5. CE Host Mobility........................................7 3.6. APR Proxy...............................................8 4. Comparison between VS and VPLS on Scalability................8 5. Future work..................................................8 6. Security Considerations......................................8 7. IANA Considerations..........................................8 8. Acknowledgements.............................................9 9. References...................................................9 9.1. Normative References....................................9 9.2. Informative References..................................9 Authors' Addresses.............................................10 Xu Expires October 11, 2011 [Page 2] Internet-Draft Virtual Subnet April 2011 1. Introduction To achieve service agility to the full extent of current virtual machine technology, cloud data center operators are demanding solutions for VM mobility across data centers of geographically dispersed locations. In this challenging environment, a solution that enables fast, reliable, high-capacity and highly scalable data center interconnection is essential. Virtual Private LAN Service (VPLS) [RFC4761, RFC4762] is an available technology for such demand today. However, some scalability issues associated with the flat Layer2 Ethernet network (e.g., ARP broadcast storm, unknown unicast flooding etc.) would badly impact the network performance when extending such flat Layer2 Ethernet network to a much wider scope. This document describes a scalable IP-only L2VPN solution called Virtual Subnet (VS), which reuses BGP/MPLS IP VPN [RFC4364] and ARP proxy [RFC925][RFC1027] technologies. VS aims to be used for data center interconnection. In contrast with the existing VPLS solution, VS alleviates the broadcast storm impact on the network performance to a great extent by splitting the otherwise whole ARP broadcast and unknown unicast flooding domain associated with an IP subnet into multiple isolated parts, besides, the MAC table capacity demand on CE switches is greatly reduced due to the usage of ARP proxy. Note that non-IP traffic would not be supported in VS since VS provides an IP-only L2VPN service. 2. Terminology This memo makes use of the terms defined in [RFC4364], [MVPN], [RFC2236] and [RFC2131]. 3. Solution Description 3.1. Unicast 3.1.1. Intra-subnet Unicast As shown in Figure 1, CE hosts located in different VPN sites of a given IP-only L2VPN instance are within the same IP subnet (e.g., 10.0.0.0/8). PE routers create host routes for their local CE hosts automatically according to the corresponding ARP entries. Instead of distributing routes for their configured VPN subnets, PE routers distribute these host routes to each other. In addition, PE routers each are configured with a null route for the VPN subnet. Thus, upon receiving CE packets destined for nonexistent hosts of that subnet, Xu Expires October 11, 2011 [Page 3] Internet-Draft Virtual Subnet April 2011 ingress PE routers would discard them directly according to the null route mentioned above. Meanwhile, APR proxy is enabled on the VRF interfaces of each PE router, thus, upon receiving from a local CE host an ARP request for a known remote CE host, ingress PE router would return its own MAC address as a response. +--------------------+ +-----------------+ | | +------------------+ |VPN_A:10/8 | | | |VPN_A:10/8 | | | | | | | | +------+ ++---+-+ +-+---++ +------+ | | |Host A+----+ PE-1 | | PE-2 +----+Host B| | | +------+ ++-+-+-+ +-+-+-++ +------+ | | 10.1.1.1/8 | | | IP/MPLS Backbone | | | 10.1.1.2/8 | +-----------------+ | | | | +------------------+ | +--------------------+ | | | | | V V +-------+------------+--------+ +-------+------------+--------+ |VRF ID |Destination |Next Hop| |VRF ID |Destination |Next Hop| +-------+------------+--------+ +-------+------------+--------+ | VPN_A |10.1.1.1/32 | Local | | VPN_A |10.1.1.2/32 | Local | +-------+------------+--------+ +-------+------------+--------+ | VPN_A |10.1.1.2/32 | PE-2 | | VPN_A |10.1.1.1/32 | PE-1 | +-------+------------+--------+ +-------+------------+--------+ | VPN_A |10.0.0.0/8 | NULL | | VPN_A |10.0.0.0/8 | NULL | +-------+------------+--------+ +-------+------------+--------+ Figure 1: Intra-subnet Unicast Assume host A sends an ARP request for host B before communicating with host B, upon the receipt of this ARP request, PE-1 lookups the associated VRF table to find the corresponding host route for host B. If that route is found and it is learnt from a remote PE router, PE- 1, as ARP proxy, returns its own MAC address as a response to the above ARP request. Otherwise, PE-1 doesn't need to respond to that ARP request. Once receiving the above ARP reply from PE-1, host A would send an IP packet to B with destination MAC address of PE-1's MAC address. PE-1, as ingress PE router, tunnels the above packet towards the egress PE router (i.e., PE-2) which in turn forwards the packet to the destination CE host (i.e., host B). 3.1.2. Inter-subnet Unicast As shown in Figure 2, for a CE host (e.g., host A) to communicate with other hosts outside its subnet, a PE router (e.g., PE-2) which Xu Expires October 11, 2011 [Page 4] Internet-Draft Virtual Subnet April 2011 is connected to a CE gateway router (e.g., GW) SHOULD be configured with a default route of that VPN with next-hop of that CE gateway router, and then advertise such default route to other PE routers. +--------------------+ +-----------------+ | | +-------------+ |VPN_A:10/8 | | | |VPN_A:10/8 | | | | | | | | +------+ ++---+-+ +-+---++ +----+--+ | |Host A+----+ PE-1 | | PE-2 +-------+ GW | | +------+ ++-+-+-+ +-+-+-++ +----+--+ | 10.1.1.1/8 | | | IP/MPLS Backbone | | |10.1.1.2/8 | +-----------------+ | | | | +-------------+ | +--------------------+ | | | | | V V +-------+------------+--------+ +-------+------------+--------+ |VRF ID |Destination |Next Hop| |VRF ID |Destination |Next Hop| +-------+------------+--------+ +-------+------------+--------+ | VPN_A |10.1.1.1/32 | Local | | VPN_A |10.1.1.2/32 | Local | +-------+------------+--------+ +-------+------------+--------+ | VPN_A |10.1.1.2/32 | PE-2 | | VPN_A |10.1.1.1/32 | PE-1 | +-------+------------+--------+ +-------+------------+--------+ | VPN_A |10.0.0.0/8 | NULL | | VPN_A |10.0.0.0/8 | NULL | +-------+------------+--------+ +-------+------------+--------+ | VPN_A |0.0.0.0/0 | PE-2 | | VPN_A |0.0.0.0/8 | GW | +-------+------------+--------+ +-------+------------+--------+ Figure 2: Inter-subnet Unicast Assume host A sends an ARP request for its gateway (i.e., GW) before sending a packet to a destination host outside its subnet. Upon receiving this ARP request, PE-1 returns its own MAC address as a response in accordance with the rules described in section 3.1.1. Host A then sends an IP packet to that destination host with destination MAC address of PE-1's MAC. Upon receiving the above packet, PE-1, as ingress PE router, tunnels it towards the corresponding egress PE router (i.e., PE-2) according to the default route announced from PE-2. PE-2 as egress PE router, in turn, forwards the packet to the CE gateway router (i.e., GW) according to the configured default route. Since PE routers of that VPN each have been configured with a null route, those packets destined for nonexistent CE hosts of that VPN subnet would be dropped directly by ingress PE routers, rather than being mistakenly forwarded to the CE gateway router. Xu Expires October 11, 2011 [Page 5] Internet-Draft Virtual Subnet April 2011 For redundancy purpose, more than one CE gateway router would be connected to a given VPN subnet via separate PE routers. In this case, Virtual Router Redundancy Protocol (VRRP) [RFC2338] SHOULD run among these CE routers, besides, only the PE router which is connected to the Virtual Router Master is entitled to announce a default route for that VPN subnet. To achieve that goal, the next- hop of the default route SHOULD be set to the corresponding Virtual Router IP address, and the default route SHOULD not be deemed as valid unless there is a directly connected host route for the next- hop address. Due to the fact that only the Virtual Router Master is entitled to respond to ARP requests for the corresponding Virtual Router IP address and broadcast gratuitous ARP requests or replies on behave of the Virtual Router, only the PE router which is connected to the Virtual Router Master could have an ARP entry corresponding to the Virtual Router IP address and therefore could have a directly connected host route for the Virtual Router IP address. In this way, packets destined for the outside of a given VPN subnet would be exactly sent to the corresponding Virtual Router Master. 3.2. Multicast/Broadcast The MVPN technology [MVPN], in particular, the Protocol-Independent- Multicast (PIM) tree option with some extensions, could be reused here to support IP multicast and broadcast between CE hosts of the same VPN. For example, PE routers attached to a given VPN join a default provider multicast distribution tree which is dedicated for that VPN. Ingress PE routers, upon receiving customer multicast or broadcast traffic from their local CE hosts, tunnel such customer traffic towards remote PE routers of the same VPN over the corresponding default provider multicast distribution tree. When receiving customer multicast or broadcast traffic over a provider multicast distribution tree, egress PE routers forward such customer traffic via the corresponding VRF interfaces. More details about how to support multicast and broadcast in VS will be explored in a later version of this document. 3.3. CE Host Discovery When receiving an ARP request or reply from a local CE host, PE router SHOULD cache or update the corresponding ARP entry for that CE host. In addition, PE router SHOULD periodically send ARP requests to each learnt local CE host in unicast so as to keep the learnt ARP entries fresh. Xu Expires October 11, 2011 [Page 6] Internet-Draft Virtual Subnet April 2011 To assure a PE router discovers all local CE hosts in time, this PE router SHOULD perform IP or ARP scan on its local VPN site at least once when rebooting up. One option is to use ICMP echo for host discovery. For example, PE router could send an ICMP echo request to IP broadcast address through its VRF interfaces, every CE host attached to the corresponding VPN sites will respond with an ICMP echo reply when receiving that ICMP echo request. Hence, the PE router could learn the IP and MAC pairs of all local CE hosts by inspecting the source IP and MAC addresses in the received ICMP echo replies. 3.4. CE Multi-homing When a VPN site is multi-homed to more than one PE router for redundancy purpose, VRRP SHOULD run among these PE routers, besides, only the Virtual Router Master SHOULD respond to the ARP requests from local CE hosts and it MUST use the Virtual Router MAC address in any ARP packet it sends. To achieve active-active multi-homing (i.e., remote PE routers could send customer traffic destined for a given multi-homed VPN site via either PE router of that multi-homed VPN site), those PE routers of VRRP Slave could also perform the host discovery function and accordingly advertise host routes for local CE hosts as the PE router of VRRP master does. Note that there is no contravention to the VRRP specification [RFC2338]. 3.5. CE Host Mobility Once a CE host moves from one VPN site to another, it will usually send out a gratuitous ARP request or reply when attaching to a new VPN site. The PE router attached to the new VPN site will generate a directly connected CE host route upon receiving that gratuitous ARP request or reply. When the PE router attached to the old VPN site receives a host route announcement for one of its local CE hosts from a remote PE router, it SHOULD immediately send an ARP request for that CE host through the corresponding VRF interface to determine whether that CE host is still connected to itself. If no corresponding ARP reply is received in a short amount of time, the PE router SHOULD delete the corresponding ARP entry of that CE host and accordingly withdraw the corresponding host route. Meanwhile, the PE router SHOULD broadcast a gratuitous ARP through the corresponding VRF interface on behalf of that CE host, with the sender hardware address field being filled Xu Expires October 11, 2011 [Page 7] Internet-Draft Virtual Subnet April 2011 with its own MAC addresses. As a result, the ARP entry for that CE host cached on other local CE hosts would be updated correctly. 3.6. APR Proxy A PE router, as ARP proxy, SHOULD only respond to ARP requests for those CE hosts which are exactly attached to other PE routers. In other words, the PE router SHOULD not respond to ARP requests for its local CE hosts or those nonexistent CE hosts. When VRRP runs between multiple PE routers which are attached to the same VPN site, only the PE router which is Virtual Router Master is entitled to perform the ARP proxy function. 4. Comparison between VS and VPLS on Scalability Since PE routers in VPLS work much similar as Spanning Tree Protocol (STP) bridges, most scalability issues associated with the STP bridge protocol (e.g., ARP broadcast storm, unknown unicast flooding, etc.) are intactly inherited to VPLS. By splitting the otherwise whole ARP broadcast and unknown unicast flooding domain of a subnet into multiple isolated parts, VS alleviates the broadcast storm impact on network performance to a great extent. For example, ARP broadcast traffic is limited within the scope of a VPN site. Similarly, unknown unicast traffic would not be flooded across the MPLS/IP backbone as well. As for the MAC table capacity requirement on CE switches, CE switches in VPLS would have to learn the MAC addresses of both local CE hosts and remote CE hosts while CE switches in VS only needs to learn the MAC addresses of local CE hosts and local PE routers. 5. Future work How to support IPv6 CE hosts in VS is for future study. 6. Security Considerations TBD. 7. IANA Considerations There is no requirement for IANA. Xu Expires October 11, 2011 [Page 8] Internet-Draft Virtual Subnet April 2011 8. Acknowledgements Thanks to Dino Farinacci for his valuable comments. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [RFC4364] Rosen. E and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. [MVPN] Rosen. E and Aggarwal. R, "Multicast in MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-10.txt (work in progress), Janurary 2010. [MVPN-BGP] R. Aggarwal, E. Rosen, T. Morin, Y. Rekhter, C. Kodeboniya, "BGP Encodings for Multicast in MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast-bgp-08.txt (work in progress), September 2009. [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol or Converting Network Protocol Addresses to 48-bit Ethernet Addresses for Transmission on Ethernet Hardware", RFC-826, Symbolics, November 1982. [RFC925] Postel, J., "Multi-LAN Address Resolution", RFC-925, USC Information Sciences Institute, October 1984. [RFC1027] Smoot Carl-Mitchell, John S. Quarterman, "Using ARP to Implement Transparent Subnet Gateways", RFC 1027, October 1987. [RFC2338] Knight, S., et. al., "Virtual Router Redundancy Protocol", RFC 2338, April 1998. [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 2", RFC 2236, November 1997. [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, January 2007. Xu Expires October 11, 2011 [Page 9] Internet-Draft Virtual Subnet April 2011 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service (VPLS) Using Label Distribution Protocol (LDP) Signaling", RFC 4762, January 2007. Authors' Addresses Xiaohu Xu Huawei Technologies, No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District, Beijing 100085, P.R. China Phone: +86 10 82882573 Email: xuxh@huawei.com Xu Expires October 11, 2011 [Page 10]