Network Working Group Roger Lapuh Dinesh Mohan Internet Draft: draft-lapuh-network-smlt-01.txt Nortel Networks Category: Informational Expiration Date: August 2003 February 2003 Split Multi-link Trunking (SMLT) Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes SMLT. When building redundant bridging networks, IEEE 802.3ad offers link redundancy as a measure against the link failures. SMLT additionally offers node redundancy by allowing IEEE 802.3ad links of a link-aggregated group to be dual homed across two aggregation bridges. SMLT provides data plane and control plane to avoid inherent loops and makes full usage of all links in a link-aggregated group. The dual homing remains transparent to a device connecting to the two aggregation bridges. Lapuh, et. al Informational [Page 1] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 Table of Content: Status of this Memo................................................1 Abstract...........................................................1 1. Conventions used in this document...............................2 2. Introduction....................................................2 2.1 Reasons to deploy SMLT.........................................3 3. Definitions.....................................................3 4. How does SMLT work in a L2 network?.............................4 4.1 IST Protocol...................................................7 5. Problems Solved.................................................8 5.1 Layer-2 Traffic Load Sharing...................................8 5.2 Protection in Core of the Network..............................8 5.3 No single point of failure.....................................9 6. Failure Scenarios...............................................9 6.1 Loss of SMLT link..............................................9 6.2 Loss of Aggregation Bridge....................................10 6.3 Loss of IST Link..............................................10 6.4 Loss of multiple aggregation bridges in different aggregation bridge pairs......................................................10 6.5 Loss of all IST Links between an aggregation bridge pair......10 7. SMLT's relation with Spanning Tree/Rapid Spanning Tree.........11 8. Security Considerations........................................12 9. Intellectual Property Considerations...........................12 10. References....................................................12 11. Acknowledgments...............................................13 12. Author's Addresses............................................13 Full Copyright Statement..........................................13 1. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. The word "bridge" is to be interpreted as described in IEEE 802.1D. ASIC-based Ethernet Switches also conform to the "bridging" functions mentioned above. The two words are used inter-changeably throughout the document. 2. Introduction This document describes SMLT. When building redundant bridging networks, IEEE 802.3ad offers link aggregation for bandwidth Lapuh, et. al Informational [Page 2] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 increase and also link redundancy as a measure against link failures. SMLT additionally offers node redundancy by allowing IEEE 802.3ad links of a link-aggregated group to be dual homed across two aggregation bridges. Traditionally, when a loop is created in the Layer 2 network by dual-homing devices, Spanning Tree Protocol is used to block one of the redundant network paths. SMLT provides data plane and control plane to avoid inherent loops and makes full usage of all links in a link-aggregated group. The dual homing remains transparent to a device connecting to two aggregation bridges. This is accomplished by implementing a method that allows the two aggregation bridges appear as a single bridge to a SMLT client. The aggregation bridges make use of an Inter-Switch-Trunk (IST) between them over which they exchange information, permitting rapid fault detection and data path modification. Although SMLT is primarily designed for Layer 2 Ethernet networks, it also provides benefits for Layer 3 networks. 2.1 Reasons to deploy SMLT As networks grow ever more critical, there is a demand to eliminate all single points of failure, such that a permanent loss of connectivity can be avoided in most failure cases and recovery from such failures can be in the sub second order. This is normally achieved through multiple paths from client devices into the core of the network. Moreover, it is highly desirable that the elimination of single points of failure does not result in unused capacity (which is often very costly) and, perhaps more importantly, rerouting around failures is as fast as possible. SMLT satisfies further requirements that a solution is simple to implement, transparent, and interoperable with the majority of existing client devices.. These are key advantages that SMLT presents compared to previous attempts to solve this problem. 3. Definitions Before describing how SMLT works in detail it is necessary to define various terms. SMLT Client A device located at the edge of the network, such as in a wiring closet or CPE and connected to a pair of SMLT devices. An SMLT Client must be able to perform link aggregation (e.g. IEEE 802.3ad) but does not require any SMLT intelligence. Aggregation Bridge Lapuh, et. al Informational [Page 3] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 A bridge that connects to multiple SMLT clients. Such Aggregation Bridges may be owned by a customer in the customer's enterprise network or by a Service Provider. IST (Inter Switch Trunk) IST consists of one or more parallel point-to-point links that connect two aggregation bridges. The two aggregation bridges utilize IST to share information so that they appear as a single bridge to a SMLT client. MLT (Multi-Link Trunking) MLT is a method of link aggregation that allows multiple Ethernet links to be aggregated together as a single logical trunk. MLT provides physical layer protection against the failure of any single link and offers the combined bandwidth of these multiple links. MLT can be any link aggregation mechanism. There are currently several methods used in the industry such as, but not restricted to, IEEE 802.3ad, etc. SMLT is interoperable with IEEE 802.3ad in controllable mode (IEEE 802.3ad clause 43.3.1). However, there is no restriction that would preclude SMLT from supporting all IEEE 802.3ad's operation modes. They are however not described in this draft. SMLT (Split Multi-Link Trunking) SMLT is MLT with each link of a link-aggregation group connecting a pair of ports on two different devices (e.g. SMLT client and Aggregation Bridge). Unlike MLT, one end of the link-aggregated group is dual-homed to two aggregation bridges. 4. How does SMLT work in a L2 network? +-----+ +-----+ e ------| E |====| F |--------- f /+-----+ +-----+ / / || \ // | \ / / || \// | \ / / || /\ | \ / / || // \ | \ / / || // \ | \ / / || // \ | \ / / 1||//2 1\|2 \ +---+/ +---+ +---+ +---+ | A | | B | | C | | D | +---+ +---+ +---+ +---+ | | | | | | | | | | | | a b1 b2 c1 c2 d Lapuh, et. al Informational [Page 4] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 Figure 1: Reference Network for SMLT Figure 1 illustrates a configuration that includes a pair of Aggregation bridges E and F, and four separate SMLT clients A, B, C, and D. SMLT clients B and C are connected to the aggregation bridges via MLTs that are dual-homed to the two aggregation bridges E & F. SMLT client B uses a link-aggregated group with 4 links such that two links are used for its connection to E, and other two are used for its connection to F. SMLT client C uses a link-aggregated group with 2 links such that a single link is used for its connections to E and F. SMLT client A used a link-aggregated group with two links such that both links are used for its connection to E.. SMLT client D uses a link-aggregated group with 1 link such that this links is used for its connection to F. In the above configuration, SMLT clients A and D do not benefit from SMLT advantages, as they are not dual-homed In this example implementation of SMLT requires two aggregation bridges. These aggregation bridges must be connected via an IST (Inter Switch Trunk.) The SMLT aggregation bridges use IST to: - Confirm that each bridge is alive as well as exchange learned MAC address information. This requires IST to be reliable such that it does not become a single point of failure itself. - Forward flooded packets or packets destined for non-SMLT client devices physically connected to the other aggregation bridge. - Forward traffic destined for a SMLT client in case of a failure of SMLT link between the SMLT client and other aggregation bridge. If all clients that are connected to the aggregation bridges are dual-homed (like clients B and C) the traffic on the IST only consists of the IST control packets exchanged in normal operation. If devices are single homed like clients A and D, on average 50% of the traffic will use the IST link in normal operation. In case of a SMLT link failure, IST link is used as a backup link to forward traffic to the destinations. These requirements dictate that the IST should preferably be, but are not restricted to, a multi-gigabit MLT with connections between both SMLT aggregation bridges in order to ensure that there is no single point of failure in the IST. Lapuh, et. al Informational [Page 5] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 In case of an IST failure, no data traffic is lost and network remains intact if all clients are dual-homed. Though the SMLT clients are dual-homed to two aggregation bridges, they require no knowledge of whether they are connected to a single bridge or to two bridges. SMLT intelligence is required only on the pair of aggregation bridges. Logically, the pair of aggregation bridged appear as a single path aggregation bridge to the clients. Figure 1 also includes end stations connected to the bridges: a, b1, b2, c1, c2, and d may be hosts; e and f may be hosts, servers or routers. SMLT clients B and C can use different methods for determining the link of MLT to use for forwarding a packet, so long as the same link is used for a given flow. This requirement ensures that there will be no out-of-sequence packets between any pair of communicating devices. Aggregation bridges will always send traffic direct to a SMLT client and use the IST only for traffic that they cannot forward using a direct link. The examples below explain the process. Traffic from a to b1 and/or b2, assuming a and b1/b2 are communicating via layer 2, goes from bridge A to bridge E and then forwarded to E's direct link to bridge B. Traffic coming from b1 or b2 to a is sent by bridge B on one of its MLT ports. Since bridge B is transparent to SMLT functionality and treats its MLT as a regular MLT, it may send traffic from b1 to a on the link to bridge E and the traffic from b2 to a on the link to bridge F. In the case of traffic from b1, bridge E just forwards the traffic directly to bridge A, while traffic from b2, which arrived at bridge F, is forwarded across the IST to bridge E and then to bridge A. Traffic from b1/b2 to c1/c2 will always be sent by bridge B to its MLT to the core. No matter which bridge (E or F) it arrives at it will be sent directly to bridge C through the direct link. This is the reason why dual-homing all client bridges to the aggregation pair reduces the amount of possible traffic load on the IST link. A single IST failure (all SMLT links active) in this scenario does not cause traffic interruption. This minimizes the risk of network downtime even further. Traffic from a to d and vice versa is forwarded across the IST because it is the shortest path. IST, in this case, is treated purely as a standard IEEE 802.3ad link with no account taken of SMLT. Finally traffic from f to c1/c2 will be sent directly from bridge F. Return traffic from c1/c2 to f may be sent across the IST if bridge C sends it down the link to the bridge E. Lapuh, et. al Informational [Page 6] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 4.1 IST Protocol The IST link connecting two aggregation bridges in Figure 1 runs a protocol that allows the following messages to be exchanged between the two aggregation bridges: IST Hello Each aggregation bridge periodically transmits and listens for IST Hello messages on its IST ports where IST ports are those ports that connect IST link. These messages indicate the following: - Aggregation bridge port from which the IST Hello message is being sent is set to type IST - Whether the sending port is receiving IST Hello messages from the other end - The expected time interval between IST Hello messages that are received from the other end. SMLT Status Each aggregation bridge periodically reports SMLT Status to the other aggregation bridge. SMLT status message includes the following: - SMLT ID: this value is assigned to a SMLT by an aggregation bridge when the SMLT is configured on the aggregation bridge. It provides a reference to each SMLT client on an aggregation bridge. - SMLT Status: Status of SMLT can be either 1) in service (one or more of the SMLT links on an aggregation bridge are operating), or 2) out of service (all members of the SMLT links on an aggregation bridge are not operating). Learned or Migrated MAC addresses When an aggregation bridge learns a new MAC address on any of its ports, it notifies the other aggregation bridge about the learned MAC address value and the port type of the port at which the address was learned. When address is learned against a SMLT port, SMLT ID is also passed. MAC address aged out When the age expires for a MAC address learned against a non-SMLT port, the aggregation bridge deletes the MAC address record and sends a message to the other aggregation bridge to report this event. The other aggregation bridge deletes its own record. SMLT address aged out When the age expires for a MAC address learned against a SMLT port, the aggregation bridge does not delete the MAC address record. It does mark the address as having expired locally, sends a message to the other aggregation bridge and waits to receive confirmation from the other aggregation bridge. When the other aggregation bridge receives the message, it marks the address as having expired remotely, but waits to delete the MAC address record only after it expires locally. Lapuh, et. al Informational [Page 7] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 Local Bridge ID This message is sent by an aggregation bridge to notify the other aggregation bridge of the value of its own Bridge Identifier. This permits each aggregation bridge to compare its ID with the other aggregation bridge's Bridge ID, to decide who will be the Root Bridge, and the values of BPDU parameters used on the SMLT. The above messages can be mapped to any existing protocols e.g. IEEE 802.3ad's LACP etc. 5. Problems Solved 5.1 Layer-2 Traffic Load Sharing The MLT path selection algorithm used on the SMLT client bridge achieves load sharing from a SMLT client perspective. Usually path selection is done on a SRC/DST MAC address basis but other techniques can be used. Load sharing from the aggregation bridge perspective is achieved by sending all traffic destined to the SMLT client across direct links and not over the IST. IST is normally not used to send/receive traffic to/from a dual-homed SMLT client. Traffic received on the IST by an aggregation bridge is not forwarded on SMLT links because the other aggregation bridge will have forwarded the traffic, thus eliminating the possibility of a loop in the network. The only exception to this rule is if the SMLT links on the peer aggregation bridge are down, traffic received over IST will be forwarded to the corresponding SMLT client. 5.2 Protection in Core of the Network SMLT may be used within the core of the network. It is also possible to configure SMLT groups in a square or full mesh scenario, but in this case both sides of the MLT are configured for SMLT. SMLT Square with two aggregation pairs facing each other +-----+ +-----+ | E |----| F | +-----+ +-----+ || | || | || | || | || | || | || | +---+ +---+ Lapuh, et. al Informational [Page 8] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 | B |------| C | +---+ +---+ SMLT full mesh with two aggregation pairs facing each other +-----+ +-----+ | E |----| F | +-----+ +-----+ || \ / | || \/ | || /\ | || / \ | || / \ | || / \ | ||/ \| +---+ +---+ | B |------| C | +---+ +---+ These configurations are possible because there is no state information passed across the MLT and thus both ends believe that the other end is a single bridge. As a result, no loop is introduced and any of the core bridges or any of the connecting links between them may fail and the network would still recover rapidly. Furthermore, it is possible to scale SMLT groups to achieve hierarchical network designs by connecting SMLT groups together. This allows building redundant loop free L2 domains without Spanning Tree while fully using all network links. 5.3 No single point of failure Any single link or either aggregation bridge can fail and recovery will take place in less than 1 second. Note that this number is conservative depending on the implementation and the link loss detection mechanisms network might experience loss for less than 1 second. See the analysis below for further details. 6. Failure Scenarios 6.1 Loss of SMLT link The SMLT client detects link failure and sends traffic on the other SMLT link(s) just as is done with MLT. Detection and fail-over is dependent on how quickly the client can detect link failures. Lapuh, et. al Informational [Page 9] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 If the link is not the only one between the SMLT client and aggregation bridges in question then the aggregation bridge also uses MLT detection and rerouting to move traffic to the remaining links. If the link is the only one to the aggregation bridge then on failure detection the bridge informs the other aggregation bridge of SMLT trunk loss. The other aggregation bridge then treats the SMLT trunk as a regular MLT trunk. If the link is reestablished, the aggregation bridges detect this and move the trunk back to SMLT operation. 6.2 Loss of Aggregation Bridge The SMLT client detects link failure and sends traffic on the other SMLT link(s) just as with MLT. The operational aggregation bridge detects loss of partner (IST and keep alive packets lost) and changes all the SMLT trunks to MLT trunks. If the partner returns, the operational aggregation bridge detects this (IST becomes active) and moves the trunks back to SMLT operation once full connectivity is reestablished. 6.3 Loss of IST Link The SMLT clients do not detect a failure and communicate as usual. In normal use, there will be more than one link in the IST (as it may itself be an aggregated link. Thus IST traffic resumes over the remaining links in the IST. 6.4 Loss of multiple aggregation bridges in different aggregation bridge pairs Note that in this case one may have exceeded the goal of providing connectivity only after a single failure since for this to happen multiple failures must occur. The SMLT clients do not detect a failure and communicate as usual. Since each aggregation bridge pair is a separate entity, each is unaffected by failures elsewhere. Thus connectivity is unaffected, although, since available bandwidth is drastically reduced, packet loss and increased latency may occur. 6.5 Loss of all IST Links between an aggregation bridge pair Lapuh, et. al Informational [Page 10] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 Note that in this case one may have exceeded the goal of providing connectivity only after a single failure since for this to happen multiple failures must occur. In the event that all links in the IST fail, the aggregation bridges do not see each other anymore (IST keep alive lost) and both assume that their partner is dead. However for the most part there are no ill effects in the network if all SMLT clients are dual homed to the aggregation bridges. 7. SMLT's relation with Spanning Tree/Rapid Spanning Tree SMLT is an architecture that fits between the MAC control layer and the IEEE 802.1D/w on top of IEEE 802.3ad. +------------------+ | 802.1D/w | +------------------+ +------------------+ | SMLT | +------------------+ +------------------+ | 802.3ad | +------------------+ Therefore the Spanning Tree Protocol/Rapid Spanning Tree protocol can be supported on top of SMLT architecture. Link failures will not trigger any STP reconvergence anymore because the logical link remains intact as long as one SMLT link out of a group is active. SMLT as an underlying path aggregation architecture underneath a Spanning Tree/Rapid Spanning Tree design has following advantages: SMLT does not have any blocking links. Therefore all configured bandwidth is available for traffic forwarding. SMLT's IST protocol is used only on a set of two bridges, the aggregation pair. The protocol, therefore, does not have any inherent delays because the two are directly connected. SMLT convergence targets are sub-second in every failure scenario. There is no root bridge election and therefore long re-election is not an issue. Lapuh, et. al Informational [Page 11] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 SMLT link failures don't generate TCNs as long as one logical link out of an SMLT group is still active - therefore flooding is only limited to the aggregation pair. 8. Security Considerations This document does not introduce any new security issues. From a customer point of view, the SMLT looks to be the same as IEEE 802.3ad link aggregation. From the Service Provider point of view, no security issues are expected since the IST communication occurs between aggregation bridges located inside the same autonomous SP network. When the two aggregation bridges are located in two different autonomous networks, there may be some security issues, e.g. sharing of bridging information. However, it is not expected that the aggregation bridges will be deployed in two SP networks. 9. Intellectual Property Considerations Nortel Networks may pursue, or is pursuing, patent protection on technology described in this document. If this document becomes, in part or whole, an IETF Standard, and if such patented technology is essential for practice of an IETF Standard incorporating in whole or part this document, Nortel Networks is willing to make available nonexclusive licenses on fair, reasonable, and non-discriminatory terms and conditions, to such patent rights it owns, solely to the extent such technology is essential to practice with such IETF standard. Also, in the event that a Nortel patent is subsequently identified as essential to an IETF Standard incorporating in whole or part this document, Nortel Networks is willing to make available a nonexclusive license under such patent(s), on fair, reasonable, and non-discriminatory terms and conditions. The terms apply to those patents for which Nortel Networks has the right to grant licenses. 10. References [802.1Q] "IEEE Standards for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networks", IEEE Std 802.1Q-1998. [802.1w] "IEEE Standard for Local and metropolitan area networks. Common specifications Part 3: Media Access Control (MAC) Bridges. Amendment 2: Rapid Reconfiguration", IEEE Std 802.1w-2001. [802.1D] "Information technology. Telecommunications and information exchange between systems. Local and metropolitan area networks. Common specifications. Part 3: Media Access Control (MAC) Bridges", ANSI/IEEE Std 802.1D-1998. Lapuh, et. al Informational [Page 12] Internet Draft draft-lapuh-network-smlt-01.txt February 2003 [802.3] "IEEE Standard for Information technology. Telecommunications and information exchange between systems. Local and metropolitan area networks Specific requirements. Part 3: Carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications", IEEE Std 802.3-2002. 11. Acknowledgments The authors would like to thank Joe Regan, David Head, Wassim Tawbi and Yili Zhao for their contributions and furthering the content of SMLT. 12. Author's Addresses Roger Lapuh Nortel Networks Wilstrasse 11 Building U95 Switzerland 8610 Phone: +1 (408) 495 1599 Email: rlapuh@nortelnetworks.com Dinesh Mohan Nortel Networks P O Box 3511 Station C Ottawa ON K1Y 4H7 Canada Phone: +1 (613) 763 4794 Email: mohand@nortelnetworks.com Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.