Network Working Group Roger Lapuh Dinesh Mohan Internet Draft: draft-lapuh-network-smlt-02.txt Nortel Networks Category: Informational Expiration Date: December 2003 June 2003 Split Multi-link Trunking (SMLT) Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes Split Multi-link Trunking (SMLT) for bridged networks. SMLT enables topologies with upstream node redundancy for increased reliability of layer 2 link aggregation subnetworks based on [IEEE 802.3ad]. [IEEE 802.3ad] provides a mechanism for network planners to have link redundancy and bandwidth aggregation in layer 2 bridged networks. It does this through a link aggregation algorithm referred to in this document as Multi-Link Trunking (MLT). Lapuh, et. al Informational [Page 1] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 Split Multi-Link Trunking enables topologies with node redundancy in addition to link redundancy, for added network reliability and bandwidth efficiency. SMLT includes both data and control plane mechanisms to ensure compatibility with, and transparency to, existing implementations of [IEEE 802.3ad] link-aggregation devices. Table of Content: Status of this Memo................................................1 Abstract...........................................................1 1. Introduction....................................................2 1.1 Rationale for SMLT.............................................4 2. Definitions.....................................................4 3. SMLT Operation..................................................5 4.1 IST Protocol...................................................7 4. Problems Solved.................................................9 4.1 Layer-2 Traffic Load Sharing...................................9 4.2 SMLT Configuration Protection in Core of the Network...........9 4.3 No single point of failure....................................10 5. Failure Scenarios..............................................10 5.1 Loss of an SMLT link..........................................10 5.2 Loss of an SMLT aggregation bridge............................11 5.3 Loss of IST Link..............................................11 5.4 Loss of all IST Links between an SMLT aggregation bridge pair.11 6. SMLT in relation to other OSI reference model layers...........11 7. Security Considerations........................................13 8. References.....................................................13 9. Acknowledgments................................................13 10. Author's Addresses............................................14 Full Copyright Statement..........................................14 1. Introduction This document describes SMLT (Split Multi-Link Trunking). When building redundant bridging networks, [IEEE 802.3ad] offers link redundancy as a measure against link failures. SMLT is a method of enhancing the benefits of link aggregation (e.g. Multi-Link Trunking, or MLT, per [IEEE 802.3ad]) by the use of dual-homing. SMLT consists of data plane and control plane mechanisms to avoid traffic issues that dual-homing can cause (i.e. inherent loops in redundant network topologies), while allowing full usage of the available bandwidth in link-aggregated groups. Multi-Link Trunking is a means of providing physical layer protection against the failure of a single link, while enabling the Lapuh, et. al Informational [Page 2] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 use of the bandwidth of all aggregated links (within MLT) in normal conditions. SMLT is a means of providing an additional level of protection against failures. SMLT enables node redundancy by allowing [IEEE 802.3ad] links of link-aggregated groups to be dual-homed across a pair of aggregating devices, herein referred to as a pair of SMLT aggregation bridges. Architecturally, SMLT consists of an optional sublayer that resides between the MAC control layer and the MAC client layer which are defined in [IEEE 802.3], similar to the optional link aggregation sublayer [IEEE 802.3ad]. This is illustrated in Figure 1. +-----------------------------+ | MAC client | +-----------------------------+ +-----------------------------+ | SMLT (optional) | +-----------------------------+ +-----------------------------+ | link aggrg. sublayer (opt) | +-----------------------------+ +-----------------------------+ | MAC control (optional) | +-----------------------------+ +-----------------------------+ | MAC | +-----------------------------+ Figure 1: S-MLT Protocol Architecture Dual-homing can create looped topologies within subnetworks. To avoid traffic looping in layer 2 subnetworks, Spanning Tree Protocol [IEEE 802.1D/w] acts to logically block redundant network paths to a given traffic flow. In SMLT networks, dual-homing is not a problem, because loops in subnetwork topologies are logically blocked. SMLT may be introduced into existing subnetworks to provide node redundancy without upgrading already installed equipment. SMLT avoids problems by allowing all aggregation paths in a dual-homed configuration to be active and forwarding traffic simultaneously as well as providing very fast traffic fail-over in the event of a link failure. An SMLT Lapuh, et. al Informational [Page 3] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 aggregation bridge pair uses an Inter-Switch-Trunk (IST) between them to exchange information so as to appear as a single logical path aggregation end point to dual-homed devices. IST signaling also protects against single points of failure (e.g. link outages and single node failures) by detecting and modifying information about forwarding data paths in sub-second intervals. 1.1 Rationale for SMLT As networks grow ever more critical, there is a demand to eliminate all single points of failure, such that a permanent loss of connectivity can be avoided in most failure cases and recovery from such failures can be accomplished quickly (i.e. in less than one second). The introduction of redundancy in mission-critical network topologies is normally achieved through engineering of multiple paths in order to eliminate all single points of failure. The objective is to ensure there is no permanent loss of connectivity because of a single failure, and that recovery from most failures is in the sub-second range. Ideally, any additional paths (introduced to eliminate single points of failure) should be available to carry traffic during normal conditions, as unused capacity can be expensive. SMLT is simple to implement and can be introduced transparently into existing networks. It is interoperable with the majority of existing wiring closet/CPE/edge devices, it provides for both link and node redundancy, and supports utilization of link bandwidth across all of the aggregatedlinks of multi-link trunks. 2. Definitions Before describing SMLT in detail it is necessary to define the following terms: SMLT Aggregation Bridge: An SMLT aggregation bridge is a device that provides bridged networking to one or more "SMLT Clients" (see below for definition), and is connected in a peering manner with at least one other SMLT Aggregation Bridge using an Inter-Switch Trunk (IST). SMLT aggregation bridges are often utilized for enterprise networking, but have applications in service provider networks as well. As such they may be owned by an enterprise, or by a service provider. SMLT Client: An SMLT Client is a device (e.g. bridge, host, server) which is typically connected to one or more Local Area Networks (LANs), and is itself directly connected to one or more SMLT aggregation bridges. Lapuh, et. al Informational [Page 4] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 Note: An "SMLT Client" is a literary construct for the purposes of describing the operation of SMLT in this document. In practice, this does not require the implementation of any SMLT functionality in a device described as an SMLT client herein. An example of a typical SMLT client is a link aggregating bridge which implements [IEEE 802.3ad]. SMLT Link: The connection between an SMLT aggregation bridge and an SMLT client is an SMLT link. IST (Inter-Switch Trunk): An IST is an aggregated point-to-point link that connects two SMLT aggregation bridges to each other. An IST is used by two SMLT aggregation bridges to exchange status and control information with each other, so as "look" like (i.e. to appear to operate like) a single bridge to any SMLT client. This enables the introduction of SMLT into existing networks, without requiring upgrades to already deployed equipment. MLT (Multi-Link Trunking): MLT is a method of link aggregation that allows multiple Ethernet links to be aggregated together, and handled as a single logical trunk. MLT provides physical layer protection against the failure of any single link and enables the full use of the combined bandwidth of the multiple links in non-failure mode conditions. MLT can be realized via many different link aggregation mechanisms. Several methods of MLT are in use today; one example is [IEEE 802.3ad]. SMLT (Split Multi-Link Trunking): SMLT is MLT with each link of a link-aggregation group connecting a pair of ports on two different devices (e.g. SMLT client and aggregation bridge). Unlike MLT, at least one end of a link-aggregated group is dual-homed to two different SMLT aggregation bridges. SMLT is interoperable with [IEEE 802.3ad] in controllable mode [IEEE 802.3ad clause 43.3.1]. However, there is no restriction that would preclude SMLT from supporting all [IEEE 802.3ad] operation modes. 3. SMLT Operation Figure 1 illustrates a reference network configuration for SMLT. The topology consists of the following elements: . Two peered SMLT aggregation bridges, namely E and F . Four LAN bridges (A, B, C, D) which are SMLT clients because they home into one or both of the SMLT aggregation bridges . Eight end stations: a1, b1, b2, c1, c2, d1 (all hosts), and e1 and f1 (which may be hosts, servers or routers) +-----+ +-----+ e1 -----| E |====| F |------- f1 Lapuh, et. al Informational [Page 5] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 /+-----+ +-----+ / / || \ // | \ / / || \// | \ / / || /\ | \ / / || // \ | \ / / || // \ | \ / / || // \ | \ 2/ / 2||//2 1\|1 \1 +---+/ +---+ +---+ +---+ | A | | B | | C | | D | +---+ +---+ +---+ +---+ | | | | | | | | | | | | a1 b1 b2 c1 c2 d1 Figure 2: Reference Network for SMLT SMLT clients B and C use dual-homing to connect to both of the SMLT aggregation bridges. B uses a link-aggregated group containing four links where two of them connect B to E, and the other two connect B to F. C uses a smaller link-aggregated group with two links, where one link connects C to E, and the other link connects C to F. SMLT client A also uses link-aggregation, but not for dual-homing. It has a link-aggregation group with two links, and both links are used to connect A to E. SMLT client D is single-homed and uses one link to connect with F. In figure 2, SMLT clients A and D do not benefit from the added reliability that SMLT provides for dual-homed SMLT clients. Conversely, A and D are not disadvantaged either, as they are not required to implement any new functionality to participate in this SMLT network. Dual-homed SMLT clients B and C also do not require any new functionality to be networked using SMLT. SMLT intelligence and functionality is implemented entirely within the SMLT aggregation bridges. The behavior of the paired SMLT aggregation bridges (E and F) appears identical to the behavior of a single path aggregation bridge from the perspective of B and C. The SMLT aggregation bridges (E and F) are interconnected using an Inter-Switch Trunk (IST). The IST is engineered to be reliable and not be a single point of failure. An IST has three functions: First, to enable E and F to exchange status information about each other's health, and to share learned MAC address information. Lapuh, et. al Informational [Page 6] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 Second, to forward flooded packets, packets destined for non-SMLT connected subnetworks, and packets destined to end stations reachable via the other SMLT aggregation bridge (e.g. traffic from e1 destined for d1). Third, to forward traffic to/from dual-homed SMLT clients during failure conditions (e.g. traffic from a1 to b1 during a failure of the direct connection from E to B). SMLT clients B and C can use different methods to assign traffic to the links within their MLTs as long as choice of link remains fixed for a given flow. This ensures that all traffic is delivered in- sequence between any pair of communicating end stations. SMLT aggregation bridges normally send traffic directly to an SMLT client thus using IST bandwidth only for traffic that cannot be forwarded using a direct link. The examples below explain this in more detail. SMLT client A is single-homed to SMLT aggregation bridge E. All traffic from a1 destined for any other end station must flow through E. For example, to reach b1 and/or b2 traffic flows from a1 to A and then to E. E forwards the traffic directly to B, to reach b1 and/or b2. Traffic in the reverse direction (i.e. from b1 or b2 destined for a1) flows from B to E, and then to A for delivery to a1. Traffic from b1 or b2 destined for c1 or c2 is always forwarded by B over one of its dual-homed MLT paths to either E or F. No matter which of E or F receives incoming traffic from B, the traffic is directly forwarded to C without traversing an IST. This aspect of SMLT operation tends to minimize traffic flows across ISTs. It also means that a single IST failure (in scenarios where there is only one IST) will not cause a traffic interruption for dual-homed SMLT clients. Traffic from a1 to d1 and vice versa is forwarded across an IST because it is the shortest path between E and F. For this type of traffic, an IST emulates an [IEEE 802.3ad] link (with no account taken of SMLT) as far as A and D are concerned. Traffic from f1 to c1/c2 is sent directly via F to C. Return traffic from c1/c2 to f1 may traverse an IST if C sends it to E rather than to F. 4.1 IST Protocol An IST connecting two SMLT aggregation bridges in figure 2 runs a protocol that allows the messages to be exchanged between E and F as follows: Lapuh, et. al Informational [Page 7] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 . IST Hello: E and F periodically transmit and receive IST Hello messages on IST ports. IST ports are ports used for ISTs. IST Hello messages accomplish the following: - They distinguish ISTs from MLTs; any port sending IST Hello messages is an IST port - They verify continuity and communication across ISTs; every port sending IST Hello messages should expect to receive incoming IST Hello messages from the other end under normal conditions - They help to identify abnormal conditions, by establishing a normal time interval between incoming IST Hello messages . SMLT Status: E and F exchange SMLT status information with each other about SMLT client links as follows: - SMLT ID: this is a provisioned value used by E and F to uniquely identify links to the same SMLT client. - SMLT Status: this can be either 1) in-service (meaning that the SMLT client is reachable; one or more of the SMLT links to that SMLT client are operating), or 2) out-of-service (indicating that the SMLT client is not reachable; all links between the SMLT aggregation bridge reporting this status and the SMLT client are not operating). . Learned or Migrated MAC addresses Whenever E or F learns a new MAC address on any of its ports, it shares this information with its peer. The learned MAC address value and the port type of the port at which the address was learned is communicated across an IST. When the address is learned against an SMLT client port, SMLT ID is also passed. . MAC address aged out When the age expires for a MAC address learned against a non-SMLT port, the SMLT aggregation bridge deletes the MAC address record and sends a message to inform its peer of this action, and to trigger the deletion of that Mac address from the other SMLT aggregation bridge's address table. . SMLT address aged out When the age expires for a MAC address learned against an SMLT port, the SMLT aggregation bridge does not delete the MAC address record. It marks the address as having expired locally, and sends a message with this information to its peer. When the peer SMLT aggregation bridge receives this message, it marks the address as having expired remotely, takes no further action until its own MAC address record expires locally. . Local Bridge ID This message is sent by an SMLT aggregation bridge to notify its peer of the value of its own Bridge Identifier. This permits each of E and F to compare their own ID with the others, and to decide which is the Root Bridge, as well as the values of BPDU parameters used in SMLT. Lapuh, et. al Informational [Page 8] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 4. Problems Solved 4.1 Layer-2 Traffic Load Sharing Load Sharing from the SMLT client perspective, is achieved by the MLT path selection algorithm used on the edge SMLT client switch. Usually this path selection is done on a SRC/DST MAC address basis but other techniques can be used. Load sharing from the SMLT aggregation bridge perspective is achieved by sending all traffic destined to an SMLT client directly across available links and not over the IST trunk. The IST trunk is normally not used to send/receive traffic to/from an SMLT dual-homed SMLT client. Traffic received on the IST by an SMLT aggregation bridge is not forwarded on SMLT because the other SMLT aggregation bridge will have forwarded the traffic, thus eliminating the possibility of a loop in the network. The only exception to this rule is if the SMLT links on the peer aggregation bridge are down, then traffic received over IST will be forwarded to the corresponding SMLT client. 4.2 SMLT Configuration Protection in Core of the Network SMLT is not limited to use in triangle topologies as shown in figure 2. Several combinations of SMLT designs are feasible, including: - bridge using link aggregation dual-homed to an SMLT aggregation bridge pair (i.e. an SMLT triangle) - SMLT aggregation bridge pair multi-homed to a different SMLT aggregation bridge pair, thus allowing up to four bridges form a combined aggregation group SMLT (i.e. an SMLT square) SMLT can also be used within the core of a network. SMLT groups can be configured in a square or full mesh scenario. The following illustration depicts an SMLT square with two SMLT aggregation bridge pairs (E/F and G/H) facing each other. +-----+ +-----+ | E |====| F | +-----+ +-----+ || | || | || | || | || | || | Lapuh, et. al Informational [Page 9] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 || | +---+ +---+ | G |======| H | +---+ +---+ The following illustration depicts an SMLT full mesh with two SMLT aggregation bridge pairs facing each other +-----+ +-----+ | E |====| F | +-----+ +-----+ || \ / | || \/ | || /\ | || / \ | || / \ | || / \ | ||/ \| +---+ +---+ | G |======| H | +---+ +---+ These configurations are possible because no state information is passed across MLT links and thus both devices terminating a MLT believe that the other end is a single bridge. The result is that there is no logical looping in this network topology; any of E, F, G, or H or any of the connecting links between them can fail and the surviving elements of this network continue to forward traffic without interruption. It is also possible to scale SMLT groups to achieve hierarchical network designs by connecting SMLT groups together. This allows building redundant loop-free L2 domains without Spanning Tree and while still fully using all network links. 4.3 No single point of failure Any single link or any SMLT aggregation bridge can fail and recovery will take place in less than 1 second. Note that this number is conservative depending on the implementation and link loss detection mechanisms. An SMLT network might experience loss for less than 1 second. See the analysis in section 5 for further details. 5. Failure Scenarios 5.1 Loss of an SMLT link Lapuh, et. al Informational [Page 10] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 When an SMLT client detects a link failure, it sends traffic on it other SMLT link(s) per the normal algorithms of MLT. Detection and fail-over is dependent on the speed at which the SMLT client device detects link failures. If the link is not the only one between the SMLT client and an SMLT aggregation bridges, then the SMLT aggregation bridge uses standard MLT detection and rerouting to move traffic to the remaining links. If the failed link is the only one to an SMLT aggregation bridge, then on failure detection the SMLT aggregation bridge informs the other SMLT aggregation bridge of the SMLT link failure. The other SMLT aggregation bridge then treats the SMLT link as a regular MLT trunk. When the link is reestablished, both SMLT aggregation bridges detect this, and put the recovered link back into regular SMLT operation. 5.2 Loss of an SMLT aggregation bridge The SMLT client switch detects link failure and sends traffic on the other SMLT link(s) just as with standard MLT. The surviving SMLT aggregation bridge detects the loss of its peer (via IST events such as link down or IST Hello timeout)and treats all SMLT trunks as regular MLT trunks. When the peer SMLT aggregation bridge return to service, the operational SMLT aggregation bridge detects this (by virtue of the IST becoming active) and then restores all trunks to regular SMLT operation. 5.3 Loss of IST Link The SMLT clients do not detect or notice the failure on an IST, so they continue to operate as usual. In normal use, there is more than one link in an IST; an IST is typically an aggregated link itself. Upon the failure of one IST link, traffic flows over the remaining links in the IST. 5.4 Loss of all IST Links between an SMLT aggregation bridge pair In the event that all links in an IST fail, the SMLT aggregation bridges do not see each other anymore (IST keep alive is lost) and both assume that their peer is dead. However for the most part there are no ill effects in the network if all SMLT clients are dual homed to the SMLT aggregation bridges. 6. SMLT in relation to other OSI reference model layers Lapuh, et. al Informational [Page 11] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 SMLT fits into the datalink layer as it is described in the OSI reference model. The datalink layer is above the physical layer and below the network layer. [IEEE 802.3] defines in the datalink layer the MAC layer below the optional MAC control layer which is again below the optional link aggregation sublayer. The MAC client is on top of this forming the interface to the higher layers such as bridge relay entities for [IEEE 802.1w] and similar protocols. SMLT is now an optional additional sublayer that is above link aggregation sublayer and below the MAC client. +-----------------------------+ | MAC client | +-----------------------------+ +-----------------------------+ | SMLT (optional) | +-----------------------------+ +-----------------------------+ | link aggreg. sublayer (opt) | +-----------------------------+ +-----------------------------+ | MAC control (optional) | +-----------------------------+ +-----------------------------+ | MAC | +-----------------------------+ IEEE protocols such as the [IEEE 802.1w] Rapid Spanning Tree Protocol (STP) operate on top of the MAC client. SMLT is transparent to such protocols. Link failures will not trigger any STP reconvergence because the logical link remains intact as long as one SMLT link out of a group is active. SMLT as an underlying path aggregation architecture underneath a RST design has following advantages: SMLT does not have any blocking links. Therefore all configured bandwidth is available for traffic forwarding. Lapuh, et. al Informational [Page 12] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 SMLT's IST protocol is used only on a set of two bridges, the aggregation pair. The protocol, therefore, does not have any inherent delays because the two are directly connected. SMLT convergence targets are sub-second in every failure scenario. There is no root bridge election and therefore long re-election processes are not an issue. SMLT link failures do not trigger STP to generate network wide topology change notification packets (TCN) as long as one logical link out of an SMLT group is still active - therefore flooding due to aging timer changes is limited to an SMLT aggregation bridge pair. 7. Security Considerations This document does not introduce any new security issues. From a customer point of view, the SMLT looks to be the same as [IEEE 802.3ad] link aggregation. From the Service Provider point of view, no security issues are expected since the IST communication occurs between aggregation bridges located inside the same autonomous SP network. When the two aggregation bridges are located in two different autonomous networks, there may be some security issues, e.g. sharing of bridging information. However, it is not expected that the aggregation bridges will be deployed in two SP networks. 8. References [802.1Q] "IEEE Standards for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networks", IEEE Std 802.1Q-1998. [802.1w] "IEEE Standard for Local and metropolitan area networks. Common specifications Part 3: Media Access Control (MAC) Bridges. Amendment 2: Rapid Reconfiguration", IEEE Std 802.1w-2001. [802.1D] "Information technology. Telecommunications and information exchange between systems. Local and metropolitan area networks. Common specifications. Part 3: Media Access Control (MAC) Bridges", ANSI/IEEE Std 802.1D-1998. [802.3] "IEEE Standard for Information technology. Telecommunications and information exchange between systems. Local and metropolitan area networks Specific requirements. Part 3: Carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications", IEEE Std 802.3-2002. 9. Acknowledgments The authors would like to thank Joe Regan, David Head, Wassim Tawbi, Lapuh, et. al Informational [Page 13] Internet Draft draft-lapuh-network-smlt-02.txt June 2003 Vasant Sahay, Yili Zhao and Ed Juskevicius for their contributions and furthering the content of SMLT. 10. Author's Addresses Roger Lapuh Nortel Networks Wilstrasse 11 Building U95 Switzerland 8610 Phone: +1 (408) 495 1599 Email: rlapuh@nortelnetworks.com Dinesh Mohan Nortel Networks P O Box 3511 Station C Ottawa ON K1Y 4H7 Canada Phone: +1 (613) 763 4794 Email: mohand@nortelnetworks.com Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.