TRILL Working Group Yizhou Li INTERNET-DRAFT Donald Eastlake Intended Status: Informational Weiguo Hao Huawei Technologies Expires: January 13, 2014 July 12, 2013 Problems of Active-Active connection at the TRILL Edge draft-yizhou-trill-active-active-connection-prob-00 Abstract The IETF TRILL (Transparent Interconnection of Lots of Links)_protocol provides support for flow level multi-pathing with rapid failover for both unicast and multi-destination traffic in networks with arbitrary topology and link technology between TRILL switches. Active-active at the TRILL edge is the extension, in so far as practical, of these characteristics to end stations that are multiply connected to a TRILL campus. This informational document discusses some of the high level problems to be overcome in providing active-active at the TRILL edge. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Yizhou, et al [Page 1] INTERNET DRAFT Problems of Active-Active connection July 2013 Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Target Scenario . . . . . . . . . . . . . . . . . . . . . . . 3 3. Problems in active-active connection at the edge . . . . . . . 5 3.1 Frame duplications . . . . . . . . . . . . . . . . . . . . . 5 3.2 Address flip-flop . . . . . . . . . . . . . . . . . . . . . 5 3.3 Packet drop due to RPF check . . . . . . . . . . . . . . . . 6 3.4 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.5 Member RBridges info synchronization . . . . . . . . . . . . 6 4 Current Work . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 Security Considerations . . . . . . . . . . . . . . . . . . . . 7 6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1 Normative References . . . . . . . . . . . . . . . . . . . 8 5.2 Informative References . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Yizhou, et al [Page 2] INTERNET DRAFT Problems of Active-Active connection July 2013 1 Introduction The IETF TRILL (Transparent Interconnection of Lots of Links) [RFC6325] protocol provides loop free and per hop based multipath data forwarding with minimum configuration. TRILL uses IS-IS [RFC6165] [RFC6326bis] as its control plane routing protocol and defines a TRILL specific header for user data. In a TRILL campus, communications between TRILL switches can (1) use multiple parallel links and/or paths, (2) load spread over different links and/or paths at a fine grained flow level through equal cost multipathing of unicast traffic and multiple distribution trees for multi-destination traffic, and (3) rapidly re-configure to accommodate link or node failures or additions. Active-active connection is the extension, to the extent practical, of similar load spreading and robustness to the connections between end stations and the TRILL campus. Such end stations may have multiple ports and will be connected, directly or via bridges, to multiple edge TRILL switches. It must be possible, except in some failure conditions, to load spread end station traffic at the flow level across links to such multiple edge TRILL switches and rapidly re-configure to accommodate topology changes. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. The acronyms and terminology in [RFC6325] is used herein with the following additions: CE - customer equipment. Could be a bridge or end station. TRILL switch - an alternative term for an RBridge. 2. Target Scenario The TRILL appointed forwarder [RFC6325] [RFC6327bis] [RFC6439] mechanism provides per VLAN active-standby traffic spreading and loop avoidance at the same time. One and only one appointed RBridge can Yizhou, et al [Page 3] INTERNET DRAFT Problems of Active-Active connection July 2013 ingress/egress native frames into/from TRILL campus for a given VLAN among all edge RBridges connecting a legacy network to TRILL campus. This is true whether the legacy network is a simple point-to-point link or a complex bridged LAN or anything inbetween. By carefully selecting different RBridge as appointed forwarder for different set of VLANs, load spreading over different edge RBidges across different VLANs can be achieved. This section presents a typical scenario of active-active connections to TRILL campus via multiple edge RBridges where current TRILL appointed forwarder mechanism is not applicable. The appointed forwarder mechanism [RFC6439] requires each of the edge RBridges to exchange TRILL IS-IS Hello packets from their access ports. As figure 1 shows, when multiple access links of multiple edge RBridges are bundled as an MC-LAG (Multi-Chassis Link Aggregation Group), Hello messages sent by RB1 via access port to CE1 will not be forwarded to RB2 by CE1. RB2 (and other members of MC- LAG1) will not see that Hello from RB1. Every member RBridge of MC- LAG1 thinks of itself as appointed forwarder on MC-LAG1 link for all VLANs and will ingress/egress frames for all VLANs. Hence appointed forwarder mechanism is not applicable in such active-active scenario. ---------------------- | | | TRILL Campus | | | ---------------------- | | | ----- | -------- | | | +------+ +------+ +------+ | | | | | | |(RB1) | |(RB2) | | (RBk)| +------+ +------+ +------+ |..| |..| |..| | +----+ | | | | | +---|-----|--|----------+ | | +-|---|-----+ +-----------+ | MC- | | | +------------------+ | | LAG1--->(| | |) (| | |) <---MC-LAG2 +-------+ . . . +-------+ | CE1 | | CEn | | | | | +-------+ +-------+ Yizhou, et al [Page 4] INTERNET DRAFT Problems of Active-Active connection July 2013 Active-Active connection is useful when we want to achieve the following requirements though MC-LAG implementation varies by vendor. - Flow rather than VLAN based load balancing is required. - Rapid failure recovery. Current appointed forwarder mechanism relies on the Hello timer expiration to detect the unreachability of another edge RBridge connecting to the same local Ethernet link. Then re-appoint the forwarder for specific VLANs may be required. Such procedures takes time in the scale of seconds. Active-Active connection should minimize the frame loss and recovery time in failure. 3. Problems in active-active connection at the edge This sections present the problems needed to be addressed in active- active connection scenario. 3.1 Frame duplications When an MC-LAG is formed to multiple RBridges, there may be a potential duplication of the frame to be received by the a CE. Two possible scenarios are presented as follows. 1. Looping back: CE1 forwards a multi-destination frame from a user device. As shown in Figure 1, the frame enters the TRILL campus via a member of an MC-LAG (say RB1) and then is forwarded through the campus to another member (say RB2) of the same MC-LAG. Then CE1 receives a duplicated copy from RB2. 2. Duplication from remote: A remote RBridge sends a multi- destination frame of VLAN x. All members of MC-LAG1 will receive the frame. As each of them thinks it is the appointed forwarder for all VLANs, they would all forward the frame to CE1. The consequence is CE receives multiple copies. Frame duplication only happens in multi-destination frame forwarding. Unicast does not have this issue. 3.2 Address flip-flop Consider RB1 and RB2 using their own nickname as source nickname to ingress data frame into a TRILL campus. As shown by Figure 1, CE1 may send a data frame with the same source MAC address to any member RB of MC-LAG1. If the egress RBridge receives TRILL packet from different ingress RBridge RBridges but with same same source MAC Yizhou, et al [Page 5] INTERNET DRAFT Problems of Active-Active connection July 2013 address, it learns different address correspondence from the data frames. Address correspondence may keep flip-flopping among nicknames of the member RBridges of the MC-LAG for the same MAC address in the same VLAN. Some TRILL switches may behave badly under these circumstances and, for example, interpret this as a severe network problem. It may also cause the returning traffic to go through the different paths to reach the destination resulting in persistent re- ordering of the frames. 3.3 Packet drop due to RPF check In order to solve the problems above, a pseudonode nickname [TRILLPN] solution was proposed. The basic idea is to represent all member links of the MC-LAG as a virtual RBridge with single pseudonode nickname. Any member RBridge of the MC-LAG should use this pseudonode nickname rather than its own nickname as ingress nickname when inject TRILL data frames. It solves the abovementioned problems pretty well; however, it introduces another issue: packet drop due to RPF check. When forwarding multi-destination frame, different member RBridges of an MC-LAG may choose the same tree. A random RBridge RBn in TRILL campus may receive the frame on single tree from the pseudonode nickname on different incoming ports. RPF check fails in this case. Frames will be dropped. 3.4 Loops Active-Active connection does not introduce extra looping risk as MC- LAG is just like a single link. So a frame will not keep geting ingress and egressed to/from the TRILL campus via a single MC-LAG link in normal situation. However we do need to pay attention that any solutions for active-active connection scenario make sure the campus is loop-free. 3.5 Member RBridges info synchronization When multiple edge RBridges are bundled as an MC-LAG to make CE multi-homed to TRILL campus, it is necessary to make sure the RBridges are aware of the status of each link in MC-LAG. Synchronization of information is necessary. 1. Member RBridges configuration synchronization: it is unavoidable to synchronize the configuration parameters among edge RBridges of an MC-LAG. Such configuration may include system ID, system priority, port key, port priority, partner information, etc. If abovementioned Yizhou, et al [Page 6] INTERNET DRAFT Problems of Active-Active connection July 2013 [TRILLPN] and/or [CMT] was employed, there are more configurations to be synchronized, for instance, pseudonode nickname of the virtual RBridge. Without synchronization mechanism, we have to manually provision each member RBridge to guarantee consistency. In addition, some of the configuration may dynamically change during failure, for instance, tree-id selected by member RBridges [CMT]. Manual inconsistency check is not applicable in this case. 2. Member RBridges state synchronization: link failure or node failure on a member RBridge may introduce packet loss. Link failure includes both access port and trunk port link failure. When failure occurs, MC-LAG may need to invoke re-selection logic to spread the traffic across the rest links/nodes. Therefore fast detection and failure recovery is required upon state synchronization. Some mechanism could be employed, for example, TRILL BFD support[TRILLBFD]. Trunk port and node failure can be detected it. However access port/link failure needs some special care. An RBridge that has an access port/link failure should notify the other members RBs with port information to make them adjust the corresponding MC- LAG. 3. Member RBridges learnt MAC address synchronization: it is required that member RBs share the MAC address and egress nickname correspondence they have learnt. By such synchronization, flooding due to unknown unicast can be reduced. If some inter-chassis protocol is employed among member RBridges for MC-LAG member discovery, info synchronization and failure handling, we need to make sure it can run smoothly over TRILL campus. The protocol may use IP address to identify the other members. We need to make sure such packets can be correctly TRILL encapsulated. If no such inter-chassis protocol is available, TRILL has to provide its own mechanisms to support the information synchronization. 4 Current Work There have been some solution drafts presented in TRILL WG. [TRILLPN], [CMT] and [TRILLBFD] address parts of the problems above. 5 Security Considerations This draft presents the problems in a particular scenario. It does not introduce any extra security risks. For general TRILL Security Considerations, see [RFC6325]. Yizhou, et al [Page 7] INTERNET DRAFT Problems of Active-Active connection July 2013 6 IANA Considerations No IANA action is required. RFC Editor: please delete this section before publication. 6 References 5.1 Normative References [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for Layer-2 Systems", RFC 6165, April 2011. [RFC6325] Perlman, R., et.al. "RBridge: Base Protocol Specification", RFC 6325, July 2011. [RFC6326bis] Eastlake, D., Banerjee, A., Dutt, D., Perlman, R., and A. Ghanwani, "TRILL Use of IS-IS", draft-eastlake-isis- rfc6326bis, work in progress. [RFC6327bis] Eastlake 3rd, D., R. Perlman, A. Ghanwani, H. Yang, and V. Manral, "TRILL: Adjacency", draft-ietf-trill- rfc6327bis, work in progress. [RFC6439] Eastlake, D. et.al., "RBridge: Appointed Forwarder", RFC 6439, November 2011. 5.2 Informative References [TRILLPN] Zhai,H., et.al., "RBridge: Pseudonode Nickname", draft-hu- trill-pseudonode-nickname, Work in progress, November 2011. [CMT] Senevirathne, T., Pathangi, J., and J. Hudson, "Coordinated Multicast Trees (CMT)for TRILL", draft-ietf-trill-cmt- 01.txt Work in Progress, November 2012 [TRILLBFD] V. Manral., et al., "TRILL (Transparent Interconnetion of Lots of Links): Bidirectional Forwarding Detection (BFD) Support", draft-ietf-trill-rbridge-bfd-07.txt work in Progress, July 2012 [8021AX] IEEE, "Link Aggregration", 802.1AX-2008, 2008. [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.1Q-2011, August, 2011 Yizhou, et al [Page 8] INTERNET DRAFT Problems of Active-Active connection July 2013 Authors' Addresses Yizhou Li Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56625375 EMail: liyizhou@huawei.com Donald Eastlake Huawei R&D USA 155 Beaver Street Milford, MA 01757 USA Phone: +1-508-333-2270 Email: d3e3e3@gmail.com Weiguo Hao Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56623144 EMail: haoweiguo@huawei.com Yizhou, et al [Page 9]