INTERNET-DRAFT Mingui Zhang Intended Status: Proposed Standard Huawei Expires: April 26, 2012 October 24, 2011 RBridge Aggregation draft-zhang-trill-aggregation-00.txt Abstract TRILL supports multi-access LAN links that can have multiple RBridges attached. This draft specifies RBridge Aggregation which enables concurrent data forwarding for the hosts in the same VLAN on a LAN link via multiple RBridges without partition this LAN link. RBridge Aggregation offers active/active multi-homing to multi-access LAN links, which improves their reliability and increases the access bandwidth of RBridge campus. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Mingui Zhang Expires April 26, 2012 [Page 1] INTERNET-DRAFT RBridge Aggregation October 24, 2011 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Frame Processing . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Unicast Ingressing . . . . . . . . . . . . . . . . . . . . 5 3.2. Unicast Egressing . . . . . . . . . . . . . . . . . . . . . 6 3.3. Multicast Ingressing . . . . . . . . . . . . . . . . . . . 6 3.4. Multicast Egressing . . . . . . . . . . . . . . . . . . . . 6 4. Address Synchronization . . . . . . . . . . . . . . . . . . . . 7 5. Configuration of RBridge Aggregation . . . . . . . . . . . . . . 7 5.1. Hashing Function Determination . . . . . . . . . . . . . . 7 6. Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Failure Recovery . . . . . . . . . . . . . . . . . . . . . 8 6.2. Failover . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.3. Connectivity of Wiring Close Topology . . . . . . . . . . 10 7. Load Balance . . . . . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 10.2. Informative References . . . . . . . . . . . . . . . . . 11 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 Mingui Zhang Expires April 26, 2012 [Page 2] INTERNET-DRAFT RBridge Aggregation October 24, 2011 1. Introduction The multipathing feature of TRILL addresses the limitation of Spanning Tree Protocol which often results in inefficient use of the link topology. It is common that a LAN link is attached to multiple edge RBridges and all these edge RBridges offer packets forwarding for this multi-access LAN link. The multiple attachment provide load balancing to the LAN link. However,currently, traffic load of a LAN link can merely be balanced among different VLANs [TRILLassign] [TRILLbase] while the traffic of hosts in a specific VLAN goes through only a single RBridge, i.e., the appointed forwarder of this VLAN. This still inherits two limitations of Spanning Tree Protocol: under-utilization of bandwidth and lack of reliability. RBridge Aggregation is proposed to addresses the above two limitations. With RBridge Aggregation, multiple edge RBridges process the frames of the same VLAN on a LAN link concurrently. They ingress frames and use the same ingress nickname (say RBv) as if the frames is ingressed by another virtual RBridge into the TRILL campus. The virtual links between the aggregated member and the virtual RBridge are advertised in LSPs to other RBridges, therefore aggregated members always act as the penultimate hop to the virtual RBridge. When the aggregated member receive frames destined to this virtual RBridge, they decapsulate these frames and egress them to the local link. The frame processing procedures are carefully designed in this document to avoid traffic duplication and forwarding loop. MAC addresses learned by any of the aggregated members MUST be immediately synchronized among all members. Simple configuration at the RBridge port and access switch port is required to realize RBridge Aggregation. With RBridge aggregation, a LAN link can achieve reliable active/active multihoming to a TRILL campus, which realizes fast failure recovery and failover. Traffic load is balanced in a finer granularity: the traffic load for a specific VLAN can freely go through any of the aggregated members. 1.1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Aggregation For loop avoidance, there can ONLY be a single appointed forwarder Mingui Zhang Expires April 26, 2012 [Page 3] INTERNET-DRAFT RBridge Aggregation October 24, 2011 ingressing and egressing native frames on a link for a specific VLAN- x at the same time [TRILL-AF]. This single forwarder mechanism does not take the full advantage of the "multiple attachment" character of TRILL networks which not only wastes the available access bandwidth but also reduces the network resilience. Take Figure 2.1 as an example, although both RB1 and RB2 have the ability to perform frame forwarding for VLAN-x, DRB can only appoint one of them to be the appointed forwarder, the other one will be inhabited from ingressing and egressing frames of VLAN-x. +-----+ +-----+ | RBi | | RBi | +-----+ +-----+ | | /\/\/\/\/\/\ /\/\/\/\/\/\ / Transit \ / Transit \ < RBridges > < RBridges > \ Campus / \ Campus / \/\/\/\/\/\/ --> \/\/\/\/\/\/ | | | | +-----+ +-----+ +-----+ +-----+ | RB1 |--| RB2 | | RB1 |--| RB2 | +-----+ +-----+ +-----+ +-----+ \ / \ / +----+ ******* +-| B1 |-+ * RBv * | +----+ | ******* | | || |[H] [H]| +----+ +--------+ +-| B1 |-+ VLAN-x | +----+ | | | |[H] [H]| +--------+ VLAN-x Figure 2.1: Illustration of Link Aggregation The RBridges can be aggregated to break the above limitations. Figure 2.1 illustrates RBridge Aggregation. RB1 and RB2 are both attached to the local link which carries VLAN-x. We assume that there is a virtual RBridge acting as VLAN-x's forwarder and using the nickname RBv. When RB1 or RB2 ingresses frames from the local link to TRILL networks, they will use RBv as the ingress nickname. The two virtual links between RB1, RB2 and RBv will be announced in LSPs. Other RBridges will believe there is a RBridge node RBv connecting RB1, RB2 and the local link. When packets are sent to the local link, RBv will serve as the egress RBridge (i.e., the last hop) while RB1 or RB2 Mingui Zhang Expires April 26, 2012 [Page 4] INTERNET-DRAFT RBridge Aggregation October 24, 2011 will serve as the penultimate hop. Note that although the examples used to illustrate RBridge Aggregation in this document include two edge RBridges, the RBridge Aggregation solution supports more than two aggregation members. The frame processing will be defined in Section 3. To ease the implementation of link aggregation, limited changes are introduced to the aggregated RBridge members while no new feature is added to the access bridge B1 and other RBridges in the campus. 3. Frame Processing RBridge Aggregation introduces two forwarders for the same LAN link. If things do not change, it is possible to cause two problems: 1. Traffic duplication - The members of the aggregated RBridges ingress or egress the same frame at the same time for the local link. Then hosts may receive duplicated frames. 2. Forwarding loop - Take Figure 2.1 as an example, RB1 sends multicast frames which will reach RB2 who will egress the multicast frame back to the local link which cause a forwarding loop. Here, the RBridge Aggregation can be looked as a shortcut between the leaf nodes of a spanning tree. This problem is called "flooding rebirth". The forwarding loop caused by flooding rebirth can further cause harmful broadcast storming to the local link. Frame processing is carefully designed in the following subsections to eliminate the above problems. Although all the aggregated RBridges have the right to deliver the frames for the local link at the same time, it's still necessary to determine a single responsible appointed forwarder for a specific frame. 3.1. Unicast Ingressing When a unicast frame is received from the local link by one of the aggregated RBridges, this ingress RBridge fills RBv into the TRILL header of the frame as the ingress nickname and then sends it to its corresponding egress RBridge as a normal unicast frame. There is no problem until we consider unknown unicast from the local link. When the access bridge receives a frame destined to a MAC address not in the address table, it will flood this frame to all other ports. The aggregation members will all receive this unicast frame. Nevertheless, the members do not know that this unicast frame is flooded to them. If the aggregated RBridges right have the destination MAC address in their address table. This frame will simply be sent as a known unicast by all the aggregated RBridges so Mingui Zhang Expires April 26, 2012 [Page 5] INTERNET-DRAFT RBridge Aggregation October 24, 2011 that the remote egress RBridge will receive duplicated frames. One solution is to configure the access links as "link aggregation" [802-1AX] at the side of the bridge (see Section 7). We can also use unknown unicast blocking technique to solve this problem: Within the access links to the aggregated RBridges, one and only one is picked out to let through unknown unicast while all the other ports suppress the egress of unknown unicast frames. Since only one aggregated RBridge will receive this unknown unicast frame, traffic duplication is avoided. 3.2. Unicast Egressing When an aggregated RBridge member receive a unicast frame whose egress nickname is the nickname of the virtual RBridge of the aggregated members, this RBridge will decapsulate the frame and egress it to the local link. 3.3. Multicast Ingressing Similar as unicast ingressing, the ingress nickname of the frames is set to RBv. In order to avoid duplicated multicast frames, multicast ingress frames can ONLY be delivered by one of the RBridges. To achieve this, the aggregated RBridges deliver multicast frames based on its locally implemented hashing function. As an example, the last bit of the source MAC address are used as the input of the hashing function. Frames with the source MAC address whose last bit is 0 will be delivered by RB1 while RB2 will simply discard such frames. Frames with the source MAC address whose last bit is 1 will be delivered by RB2 while RB1 will discard such frames. To realize fine grained load balance, more bits can be used by the hashing function of aggregated RBridges, which can be manually configured. 3.4. Multicast Egressing It is probably that both the aggregated RBridges will receive the multicast frames destined to the local link. However, only one of them will act as the forwarder of these frames according to their local hashing. Again, as an example, the last bit of the source MAC address of the multicast frames are used to break the tie: RB1 only forwards frames with the source MAC address suffixed by 0 while RB2 only forwards frames with the source MAC address suffixed by 1. When a multicast frame originated by the local link is forwarded across the TRILL network and received by the peer RBridge, it is important that the peer RBridge does not egress this frame back to the local link, otherwise it will cause a forwarding loop to the local link (flooding rebirth). The above hashing function will be Mingui Zhang Expires April 26, 2012 [Page 6] INTERNET-DRAFT RBridge Aggregation October 24, 2011 used by the peer RBridge who will determine not to forward this multicast frame. In order to keep consistence to the hashing result of the ingress RBridge, bits that are possible to be changed with the frame forwarding should not be used in hashing, such as bit from the hop count field. 4. Address Synchronization MAC addresses SHOULD be synchronized between the aggregated members through ESADI immediately after they are learned from the data plane frame processing. A MAC address sent through ESADI message from the peer is stored in the MAC table as if it is locally learned. Afterwards, a frame destined to this MAC address can be delivered to the local link or TRILL network by either of the aggregated members. In a corner case that a unicast frame are received by a aggregated member in the flight of ESADI message and the destination MAC address has not learned from its peer, this frame will be sent as an unknown unicast by this member. 5. Configuration of RBridge Aggregation RBridge Aggregation should be configured by network managers when they configure the RBridge ports. Only the RBridge ports connected to the same LAN link can be configured to be aggregated and all VLANs carried on this LAN link will be treated as aggregated. The pseudonode nickname is used as the nickname of the aggregated virtual RBridges. If the LAN link do not have pseudonode nickname, the nickname for the virtual RBridge is required to be manually configured and used by all the aggregated members. The members of an aggregated group should report connections to the aggregated VLANs so that the multicast traffic of these VLANs will reach all the members. In [TRILLbase], in order to suppress loops, multiple appointed forwarders for the same VLAN on a same local link is prohibited. This limitation should be relaxed in the RBridge Aggregation solution. 5.1. Hashing Function Determination Hashing function is well supported by hardware. Network manager should determine the TRILL data frame fields that are used as the hashing input. It is important that all aggregated members get consistent output on the same native data frame. Therefore the fields that are to be changed during frame processing MUST not be used as the hashing input. Source, Destination MAC address and inner VLAN ID are all candidates for this kind of hashing input. Mingui Zhang Expires April 26, 2012 [Page 7] INTERNET-DRAFT RBridge Aggregation October 24, 2011 Each aggregated member maintain a circular list of the aggregated members. Assume the hashing function is H(T) and there are "A" members in the Aggregated RBridges group. The responsible forwarder is chosen as RBr = H(T) mod A for multicast and broadcast packets. 6. Resilience RBridge Aggregation offers active/active multi-homing to a multi- access LAN link, which increases its reliability. In the event of access link failures, the LAN link need not wait for the time- consuming forwarder re-appointment to recover the connectivity to TRILL campus. 6.1. Failure Recovery Without RBridge aggregation, if a local link is disconnected from its Appointed Forwarder, the data forwarding can be restored after the DRB successfully choose a new appointed forwarder for this link. However, it may take a longer time before the new appointed forwarder begins to function properly. Until the new Appointed Forwarder properly functions, the disruption continues. In RBridge aggregation, if a aggregated member is not connected to the local link any more, it will send out an LSP to announce that it is not connected to the virtual RBridge RBv. Since all aggregated RBridges had reported the connection to RBv, remote RBridges in the TRILL campus can send frames to RBv via any other aggregated RBridges where the frames will be egressed to the local link. The connection to the local link remains uninterrupted. For ingressing unicast frames, if the link between the access bridge and aggregated RBridges fails, the access bridge will send these frames to the other RBridge where they will be delivered directly without disruption. Take Figure 2 as an example, suppose link B1-RB1 fails, the packets originally sent through link B1-RB1 will be sent as unknown unicast to all the interfaces of B1. Since RB2 stores all VLAN-x's addresses learned by RB1. The packets going through link B1- RB2 will be regarded as known unicast by RB2 and forwarded to its destination. Mingui Zhang Expires April 26, 2012 [Page 8] INTERNET-DRAFT RBridge Aggregation October 24, 2011 +-+ +-->0->RB1<-+ +-+ +-->0->RB1<-+ | | | 1->RB2 | | | | 1->RB2 | |H| | 2->RB3 | |H| | 2->RB3 | |A|->| ...->RB...| --> |A|->| ...->RB...| |S| | k-1->RBk | |S| | k-1->RBk+1| |H| | k->RBk+1| |H| | k->RBk+1| | | | ...->RB...| | | | ...->RB...| +-+ +-n-1->RBn--+ +-+ +-n-1->RBn--+ Figure 6.1: Hashing function change during a link failure In normal case, in order to avoid duplicate frames and forwarding loops, an aggregated member will not send multicast frames that should not be sent by it according to the hashing function. However, when an aggregation member cannot forward these frames due to link failures, the next aggregated member on the aggregation list should take over the responsibility to deliver these multicast frames. This can be realized through the local change of its hashing function. The new hashing function is changed in this way: originally, the member will only deliver the frames with the output of hashing function pointed to itself. This change is shown in Figure 6.1. When this member (RBk+1) knows that a member (RBk) is failed, it will take over the responsibility to deliver frames that are originally delivered by the failed member. Take Figure 2 for an example, in normal case, RB2 only deliver packets with source MAC addresses suffixed by 1. When link RB1-B1 fails and RB1 can not deliver the multicast frames from/to the local link, RB2 will take the responsibility to deliver packets with source MAC addresses suffixed by either 0 or 1. If the failed link is the link that let unknown unicast through. The access bridge should change the link connected to the next aggregated member to let through unknown unicast. This mechanism can be implemented through configuration of the ACL of the access bridge. 6.2. Failover When an aggregator detects that it is disconnected from the local link in the flight of data frames, it can transmit the frames to the other aggregator for delivery. In this way, the links connected to the aggregated RBridges are protected by each other. Unicast frames will be redirected directly during the failover. For a multi- destination frame or unknown unicast frame that should be delivered by one of the aggregated RBridges according to the hashing function, this RBridge can send the frame to the other RBridge through a reserved outer VLAN. The other RBridge will deliver multi- destinations frames from this reserved VLAN without considering the hashing function. Mingui Zhang Expires April 26, 2012 [Page 9] INTERNET-DRAFT RBridge Aggregation October 24, 2011 6.3. Connectivity of Wiring Close Topology According to the solution defined in Section A.3.3 of [TRILLbase], the edge RBridges connected to a wiring close topology act as the roots of spanning trees at the same time. The LAN link will be partitioned into several spanning trees. With RBridge Aggregation, the access bridge will treat the aggregated members as leaf nodes of the spanning tree. The edge RBridges do not have to emit BPDUs and participate the Spanning Tree Protocol any more. Possible forwarding loops are broken at the aggregated RBridges and the bridged LAN need not to be partitioned, which defines a clearer boundary between the TRILL campus and the traditional bridged LAN. 7. Load Balance When a LAN link is attached to aggregated RBridges, its packets can be forwarded by each of these RBridges. The access switch can configure access links as "link aggregation", then it can balance the load among these links through link aggregation technique [802-1AX]. However, the access switch can well configure these link as normal links. That is not to say the traffic are not balanced in this case. Actually, the load will be balanced in the manner of multipathing (ECMP and Mutli-Topology Routing). Take Figure 2.1 as an example, bridge B1 is attached to RB1 and RB2 through link B1-RB1 and link B1- RB2. Suppose host Ha is attached to RBridge RBi and it is sending packets to a host located in the local link. If the remote RBridge RBi selects RB1 as the egress RBridge, then B1 will learn the source MAC address at the port attached to link B1-RB1. Therefore the packets destined to Ha from the local link will naturally be sent via link B1-RB1. Otherwise, if RB2 is selected as the egress RBridge, the packets will be sent through link B1-RB2. 8. Security Considerations This document raise no new security issues for IS-IS. 9. IANA Considerations No new registry is requested to be assigned by IANA. 10. References 10.1. Normative References [TRILLbase] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol Mingui Zhang Expires April 26, 2012 [Page 10] INTERNET-DRAFT RBridge Aggregation October 24, 2011 Specification", draft-ietf-trill-rbridge-protocol-16.txt, working in progress. [TRILL-AF] R. Perlman, D. Eastlake, et al, "RBridges: Appointed Forwarders", draft-ietf-trill-rbridge-af-05.txt, working in progress. 10.2. Informative References [802-1AX] "IEEE Standard for Local and metropolitan area networks - Link Aggregation", IEEE Std 802.1 AX-2008, 3 November 2008. Mingui Zhang Expires April 26, 2012 [Page 11] INTERNET-DRAFT RBridge Aggregation October 24, 2011 Author's Addresses Mingui Zhang Huawei Technologies Co.,Ltd HuaWei Building, No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District, Beijing, 100085 P.R. China Email: zhangmingui@huawei.com Mingui Zhang Expires April 26, 2012 [Page 12]