TRILL Weiguo Hao Yizhou Li Donald Eastlake Internet Draft Huawei Radia Perlman Intel Labs Intended status: Standards Track February 14, 2014 Expires: August 2014 TRILL anycast Layer 3 Gateway draft-hao-trill-anycast-gw-00.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document may not be modified, and derivative works of it may not be created, and it may not be published except as an Internet-Draft. This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents Hao & Li,etc Expires August 14, 2014 [Page 1] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on August 14, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This draft mainly describes centralized anycast layer 3 gateway solution in TRILL campus. Comparing to traditional VRRP based active-standby layer 3 gateway solution, this solution can achieve better load balancing and scalability. Anycast nickname, anycast gateway IP and MAC are introduced. It can ensure inter-subnet traffic forwarding in flow-based load balancing mode among all physical layer 3 gateways. To avoid sending duplicated ARP reply message to the end system, ARP master gateway election mechanism is introduced. The election algorithm is described in this draft. Hao & Li,etc Expires August 14, 2014 [Page 2] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 Table of Contents 1. Introduction ................................................ 3 2. Conventions used in this document............................ 5 3. VRRP based gateways ......................................... 5 4. Anycast layer 3 gateway...................................... 6 4.1. ARP Handling ........................................... 7 4.2. Data traffic forwarding................................. 9 5. Node failure ................................................ 9 6. Anycast MAC aging on edge node.............................. 10 7. TRILL protocol extension.................................... 10 7.1. The Anycast Gateway TLV................................ 10 8. Security Considerations..................................... 11 9. IANA Considerations ........................................ 11 10. Normative References....................................... 11 11. Informative References..................................... 11 12. Acknowledgments ........................................... 11 1. Introduction In a TRILL based multi-tenancy data center network (DCN), each tenant normally owns one routing domain (RD) which may consist of one or more IP subnets. It is a common practice that one layer 2 virtual network (VN) maps to a unique IP subnet. Layer 2 virtual network in a TRILL campus is identified by a 12-bit VLAN ID or 24- bit Fine Grained Label [FGL]. All the inter-subnet communication or inter VN communication need to pass through an L3 GW. Different subnets in one tenant are usually allowed to communicate with each other freely. Gateway plays an important role in both such west-to-east traffic and traditional north-to-south traffic. Figure 1 shows a typical data center network topology. Multiple core switches serve as the layer 3 gateways. All the network nodes are RBridges running TRILL protocol. Gateway functions co-exist with traditional RBridge functions at the GW switch. There are several ways to organize the gateways. A traditional way is to use VRRP based gateways which is explained in section 3. However it has the issue of scalability and efficiency. In order to avoid single point of failure and achieve better load balancing, anycast gateway group can be used.The key idea of anycast gateway is to make multiple physical gateways share the same gateway IP and MAC address for single virtual network(VN). Hao & Li,etc Expires August 14, 2014 [Page 3] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 ,---------. ,' `. ( IP/MPLS WAN ) `. ,' * -+------+' * * * * * --------- --------- | GW1 | | GW2 | | | ************ | | --------- --------- * * * * * TRILL Campus * * * * * --------- --------- --------- --------- | TOR1 | ******** | TOR2 | ******** | TOR3 | ******** | TOR4 | | | | | | | | | --------- --------- --------- --------- | | | | | | | | ____ ____ ____ ____ ____ ____ ____ ____ |T | |T | |T | |T | |T | |T | |T | |T | |S1| |S2| |S3| |S4| |S5| |S6| |S7| |S8| ---- ---- ---- ---- ---- ---- ---- ---- Figure 1 Centralized layer 3 gateway in TRILL campus For inter-subnet layer 3 traffic, centralized layer 3 gateway is normally used and put at the boundary of TRILL network and the external IP network. In figure 1 above, GW1 and GW2 are integrated devices of layer 3 gateway and TRILL RB function. TRILL protocol runs on TOR and GW devices. West-to-east IP traffic among different VNs and north-to-south IP traffic between TRILL network and external IP network both pass through the layer 3 gateway. When the gateway receives the unicast TRILL encapsulated traffic from one layer 2 VN, it removes the TRILL encapsulation header. If destination MAC in inner Ethernet header is gateway's MAC, the gateway removes inner Ethernet header. Then the gateway looks up local IP forwarding table. If destination IP belongs to another VN in TRILL campus, the gateway will encapsulate the frame in TRILL format and send to the destination. To eliminate the single point of gateway failure and to enhance the reliability, multiple layer 3 gateways are deployed. These gateways can work in active-standby mode or active-active mode. In active- standby mode, for each VN only one gateway acts as master and is Hao & Li,etc Expires August 14, 2014 [Page 4] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 responsible for IP traffic forwarding between VNs. Network bandwidth usage is inefficient with such deployment. In a cloud computing data center, it is estimated that about 70% of traffic is east-west traffic which requires a non-blocking forwarding for line-speed traffic transmission between servers. For inter-subnet layer 3 traffic, multiple centralized layer 3 gateways working in flow-based active-active mode will enhance the network efficiency. In this draft, such anycast layer 3 gateway solution for TRILL campus is illustrated. Anycast nickname, anycast gateway IP and MAC address are introduced. Anycast gateway IP and MAC address are set on each layer 3 gateway for each VN to terminate Ethernet traffic. Anycast nickname also is shared by multiple gateways, the TRILL traffic with anycast nickname as egress nickname could go to any one of the gateways by the natural support of ECMP from TRILL protocol, so flow-based load balancing among physical gateways will be achieved. Comparing to traditional VRRP based active-standby layer 3 gateway, anycast gateway can achieve better load balancing and scalability. This document is organized as follows: Section 3 describes VRRP based gateway solution and its disadvantage. Section 4 gives anycast gateway solution overview. Section 5 describes ARP handling process. Section 6 describes data traffic forwarding. Section 7 describes TRILL protocol extension. Familiarity with [RFC6325] is assumed in this document. 2. Conventions used in this document ARP - Address Resolution Protocol. ES - End Station. VN - Virtual Network. In TRILL network, each VN can be identified by a 12 bit VLAN ID or a 24 bit Fine Grained Label. 3. VRRP based gateways Assuming in figure 1 above, COR1 and COR2 are centralized gateway in active-standby mode. TRILL protocol runs on TOR and GW device. ES is end station. ES1,ES3,ES5 and ES7 belong to VLAN1. ES2,ES4,ES6 and ES8 belong to VLAN2. The Virtual Router Redundancy Protocol (VRRP) is designed to eliminate the single point of gateway failure. VRRP is an election protocol that dynamically assigns responsibility for a virtual Hao & Li,etc Expires August 14, 2014 [Page 5] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 router to one of the VRRP routers on a layer 2 VN. Any of the virtual router's IP addresses on a LAN can then be used as the default first hop router by end-hosts. The layer 3 gateway of VRRP master is responsible for forwarding packets destined to the virtual router. If VRRP master fails, VRRP backup will take over. VRRP based solution has the following issues: 1. Inefficient network bandwidth usage. Only the VRRP master gateway forwards the traffic. VRRP slave is idle most of the time. 2. Low scalability. VRRP session among physical layer 3 gateways should be established per layer 2 VN. Large number of layer 2 VN will cause heavy CPU workload for each layer 3 gateway. 4. Anycast layer 3 gateway Multiple gateways share the same IP and MAC address for each VN. These IP and MAC address are called anycast IP and anycast MAC address respectively. Anycast IP is used as the default gateway IP address for all end hosts in the corresponding VN. Gateways always respond with the anycast MAC address when receiving ARP request for the anycast IP. As different VNs are allowed to have overlapping MAC address space, different anycast IP addresses can map to the same anycast MAC. That is to say, each VN should have a unique anycast gateway IP, however multiple anycast gateway IPs may map to the same anycast MAC. It is recommend to configure only one anycast MAC for all VNs on each gateway device for simplicity purpose. Each physical gateway performs layer 2 Ethernet traffic termination when the inner destination MAC of the incoming frame equal to its anycast MAC. To support layer 3 traffic load-balancing among all gateways, besides each layer 3 gateway's own nickname, anycast nickname is introduced, multiple gateways share the same nickname. Each gateway announces anycast nickname through the Nickname Sub-Tlv specified in [RFC6326] to TRILL network and MUST ignore the nickname collision check as defined in basic TRILL protocol. The anycast nickname used by the gateway should be set to the highest priority. With such setting, in case some other RBridge tries to use the same nickname, the gateway can always win in the nickname conflicts. Besides anycast nickname/IP/MAC, each physical gateway also has its own gateway IP and MAC for each VN and its own nickname. The source MAC of ARP reply when responding to ARP request for anycast IP from ES is always the anycast MAC. Ingress nickname should be anycast nickname when the ARP reply message is a unicast Hao & Li,etc Expires August 14, 2014 [Page 6] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 TRILL frame. For proactive ARP request from a gateway to ES, source MAC is the gateway's own MAC. In this case ingress nickname in TRILL header should be the gateway's own nickname. Edge nodes i.e. ToRs learn the consistent correspondence of anycast MAC and anycast nickname and correspondence of gateway's physical MAC and nickname through normal data plane learning mechanism. An ES has no knowledge that MAC address it gets for a gateway is actually an address for anycast purpose. The ES operates in normal way. The ES acquires correspondence between anycast MAC and anycast IP through normal ARP procedures. When the ES tries to send traffic cross subnets, it will send the frame to the gateway first. The anycast MAC is used by the end system as destination MAC. As edge nodes, ToRs in this case, learn the consistent correspondence of anycast MAC and nickname for gateway beforehand, frame from the end host sending to the gateway could go to any one of the gateways by the natural support of ECMP from TRILL protocol. The workload is well spread over all the core switches. When one gateway fails, the rest could seamlessly take over the workload automatically without running any VRRP-like keepalive protocol in between. It should not be allowed to telnet each physical gateway using the anycast IP address. The information exchange in a single telnet session may indeed go to the different physical gateways when the anycast gateway IP address is used for telnet. Consequently the state machine at the telnet initiator side may be in unpredictable and disordered states. To overcome this ,it is recommended to use gateway's own physical IP for telnet. ARP tables age independently on each physical gateways. A physical gateway should use its own MAC to send ARP request message to all ES belonging to a VN in proactive mode to acquire destination ES's ARP table. The source MAC of ARP request message should be the gateway's own MAC instead of anycast MAC, the destination ES uses the physical gateway's own MAC as destination MAC to send ARP reply message. Through this mode, the ARP reply message from destination ES can be ensured to reach the physical gateway. Inter-subnet traffic from gateway to ES can use either the gateway's own physical MAC or anycast MAC as source MAC. 4.1. ARP Handling Before an ES begins inter-subnet communication, it sends ARP request to ask the MAC address of the gateway. As the ES uses the anycast gateway IP as the target address, all physical layer 3 gateways could possibly respond it. To avoid duplicate ARP reply sending to the end system, only one physical gateway should be elected to respond. The physical gateway that responds to ARP request message Hao & Li,etc Expires August 14, 2014 [Page 7] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 is called ARP master gateway. Assuming there are k physical gateways, the algorithm to elect ARP master gateway for each VN is as follows: 1. All physical gateways are ordered and numbered from 0 to k-1 in ascending order according to the 7-octet IS-IS ID. 2. For VN ID m, choose RB whose number equals (m mod k) as ARP master gateway. The algorithm guarantees each VN has a consistent ARP master gateway. Only ARP master gateway sends ARP reply to an ES's ARP request for that VN. The rest gateways should ignore the ARP request. Sender protocol address (SPA) and Sender hardware address (SHA) in the ARP reply message is set as anycast IP address and anycast MAC address. The ARP reply message is unicast TRILL encapsulated and sent to the ES. Ingress nickname should be anycast nickname. Egress nickname is set as the nickname of egress RB connecting to the ES. As ES broadcasts ARP request message to TRILL campus, all physical gateways can learn the correspondence of from the frame. Gateways can use this information to generate IP forwarding table for that ES. In summary, through the above ARP process: 1. Edge RBs i.e. TORs learn anycast MAC address associating with anycast nickname. 2. ES learns the anycast MAC address associating with anycast gateway IP. All physical gateways learn the (ES MAC, ES IP and connected edge RB nickname) for all end systems. ARP tables age independently on each layer 3 gateway. To avoid the unnecessary flooding due to ARP table aging, the layer 3 gateway should send ARP detection message periodically in proactive mode to refresh the ARP table state. In this case, source MAC in inner Ethernet header and Sender hardware address (SHA) in the ARP request message is suggested to use the gateway's own MAC, ingress nickname is suggested to use the gateway's own nickname when it is unicast TRILL encapsulated. When the ES receives the ARP request message, ES returns unicast ARP reply message, destination MAC is the layer 3 gateway's own MAC. The message will only reach the layer 3 gateway. When the edge RB connecting the ES receives the ARP reply message, the edge RB will forward the packet to the ARP request sending layer 3 gateway. Hao & Li,etc Expires August 14, 2014 [Page 8] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 4.2. Data traffic forwarding After an ES acquires anycast MAC associated with anycast IP through above ARP handling process, it can start to send the inter-subnet IP traffic. Assuming ES1 tries to send data to ES4 in figure1. They belong to different subnet. The IP traffic forwarding process is as following: 1. ES1 sends unicast IP traffic to ES4. Destination IP is ES4's IP address, destination MAC is anycast gateway's MAC. 2. TOR1 receives the message from ES1. Because TOR1 has already learned anycast MAC address associating with anycast nickname through above ARP process, so it sends the packet with unicast TRILL encapsulation, egress nickname in TRILL header is anycast nickname. The TRILL data will reach one of the physical gateways through ECMP. Assuming the TRILL data reaches GW1. 3. GW1 receives the TRILL data from TOR1. It decapsulates the frame and get native packet. It looks up local IP forwarding table based on destination IP and tries to forward the packet to ES4. If entry of was stored on GW1, GW1 encapsulates the frame based on the information and sends it to the egress RB. The source MAC can be the gateway's own MAC or anycast MAC. If the gateway's own MAC is used as source MAC,ingress nickname of TRILL frame should be GW1's own nickname. If anycast MAC is used, ingress nickname should be anycast nickname.(If the entry is not available on GW1, the gateway will send ARP Request message to ES4 proactively.) 4. TOR2 receives the TRILL data from GW1. It decapsulates the frame and forward the payload to ES4. All layer 3 traffic will be processed in a flow-based load balancing mode among all physical gateways. Anycast gateway achieves better bandwidth utilization and scalability compared to VRRP-like mechanism. 5. Node failure When one of the layer 3 gateways fails, after network convergence, the TRILL traffic to anycast nickname will only reach the remaining gateways. ARP master gateway will be re-elected among the remaining gateways. No VRRP-like protocol session among layer 3 gateways is required to detect the node failure. Network convergence relies purely on TRILL protocol. Hao & Li,etc Expires August 14, 2014 [Page 9] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 6. Anycast MAC aging on edge node If anycast MAC aged on an edge node, when the edge node receives inter-subnet traffic from connecting ES, the edge node will flood the unicast traffic to TRILL campus as unknown unicast traffic. All physical gateways will receive the traffic, only one of the physical gateways should forward it, all others should drop it to avoid forwarding duplicated data to destination ES. The forwarding gateway is suggested to be same with ARP master device. 7. TRILL protocol extension All layer 3 gateways should announce the anycast gateway TLV in LSP defined in section 6.1 to TRILL campus. Each gateway receiving the anycast gateway TLV from other RBs with the same anycast GW nickname thinks they are in one anycast gateway group. All the gateways should ensure the anycast nickname configuration consistency. If the anycast nickname is different from the local configured one, configuration error occurs and a network warning or SNMP trap should be sent to the network management system. Anycast nickname also is carried in the Nickname Sub-Tlv specified in [RFC6326], each gateway MUST ignore the nickname collision check for anycast nickname. 7.1. The Anycast Gateway TLV +-+-+-+-+-+-+-+ |Type= ANY-GW | (1 byte) +-+-+-+-+-+-+-+ | Length | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Anycast GW Nickname |(2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o Type: TLV Type, TBD. o Length: indicates the length of LAGID field, it is a fixed value of 1. o Anycast GW Nickname: the nickname is shared by all the physical gateways in the anycast gateway group. All the inter-subnet traffic to the anycast gateways MUST use the nickname as egress nickname in TRILL header. Hao & Li,etc Expires August 14, 2014 [Page 10] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 8. Security Considerations The default value of anycast nickname priority should be set as highest value. If nickname on non-gateway and anycast nickname on gateways occurs collision, it can minimize the probability to modify anycast nickname. 9. IANA Considerations TBD 10. Normative References [1] [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for Layer-2 Systems", RFC 6165, April 2011. [2] [RFC6325] Perlman, R., et.al. "RBridge: Base Protocol Specification", RFC 6325, July 2011. [3] [RFC6326bis] Eastlake, D., Banerjee, A., Dutt, D., Perlman, R., and A. Ghanwani, "TRILL Use of IS-IS", draft-eastlake-isis- rfc6326bis, work in progress. 11. Informative References [4] [RFC 3768] R. Hinden, Ed., "Virtual Router Redundancy Protocol (VRRP)", RFC 3768, April 2004. 12. Acknowledgments The authors wish to acknowledge the important contributions of Zhang Chengsong. Hao & Li,etc Expires August 14, 2014 [Page 11] Internet-Draft TRILL anycast Layer 3 Gateway February 2014 Authors' Addresses Weiguo Hao Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56623144 Email: haoweiguo@huawei.com Yizhou Li Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56625375 Email: liyizhou@huawei.com Donald E. Eastlake Huawei Technologies 155 Beaver Street Milford, MA 01757 USA Phone: +1-508-333-2270 EMail: d3e3e3@gmail.com Radia Perlman Intel Labs 2200 Mission College Blvd. Santa Clara, CA 95054-1549 USA Phone: +1-408-765-8080 EMail: Radia@alum.mit.edu Hao & Li,etc Expires August 14, 2014 [Page 12]