INTERNET-DRAFT Mingui Zhang Intended Status: Proposed Standard Donald Eastlake Expires: November 3, 2012 Huawei May 2, 2012 RBridge Aggregation draft-zhang-trill-aggregation-02.txt Abstract TRILL supports multi-access TRILL links that can have multiple RBridges attached. This draft specifies RBridge Aggregation that enables concurrent data forwarding by multiple RBridges for the end stations in the same VLAN on a TRILL link without partition. RBridge Aggregation offers active/active multi-homing to multi-access TRILL links, which improves their reliability and increases the access bandwidth of RBridge campus. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Mingui Zhang Expires November 3, 2012 [Page 1] INTERNET-DRAFT RBridge Aggregation May 2, 2012 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Frame Processing . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Unicast Ingressing . . . . . . . . . . . . . . . . . . . . 6 3.2. Unicast Egressing . . . . . . . . . . . . . . . . . . . . . 7 3.3. Multicast Ingressing . . . . . . . . . . . . . . . . . . . 7 3.4. Multicast Egressing . . . . . . . . . . . . . . . . . . . . 7 4. Address Synchronization . . . . . . . . . . . . . . . . . . . . 7 5. Configuration of RBridge Aggregation . . . . . . . . . . . . . 8 5.1. Hashing Function Determination . . . . . . . . . . . . . . 8 6. Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Failure Recovery . . . . . . . . . . . . . . . . . . . . . 9 6.2. Failover . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.3. Connectivity of Wiring Close Topology . . . . . . . . . . . 10 7. Load Balance . . . . . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 11 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 11 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 10.1. Normative References . . . . . . . . . . . . . . . . . . . 11 10.2. Informative References . . . . . . . . . . . . . . . . . . 11 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Mingui Zhang Expires November 3, 2012 [Page 2] INTERNET-DRAFT RBridge Aggregation May 2, 2012 1. Introduction The multipathing feature of TRILL addresses the limitation of Spanning Tree Protocol which often results in inefficient use of the link topology. It is common that a TRILL link is attached to multiple edge RBridges and all these edge RBridges offer packets forwarding for this multi-access TRILL link. This multiple attachment provides load balancing to the TRILL link. However, currently, traffic load of a TRILL link can merely be balanced among different VLANs [RFC6325] while the traffic of end stations in a specific VLAN goes through only a single RBridge, i.e., the appointed forwarder of this VLAN. This still inherits two limitations of Spanning Tree Protocol: under- utilization of bandwidth and lack of reliability. RBridge Aggregation is proposed to addresses the above two limitations. With RBridge Aggregation, multiple edge RBridges process the frames of the same VLAN on a TRILL link concurrently. They ingress frames and use the same ingress nickname (say RBv) as if the frames is ingressed by another virtual RBridge into the TRILL campus. The virtual links between the aggregated member and the virtual RBridge are advertised in LSPs to other RBridges, therefore aggregated members always act as the penultimate hop to the virtual RBridge. When the aggregated member receive frames destined to this virtual RBridge, they decapsulate these frames and egress them to the local link. According to the Reverse Path Forwarding Check (RPFC) rule of [RFC6325], a multicast frame ingressed by a specific RBridge onto a specific distribution tree can only be received and accepted through a specific port at a receiver RBridge. Otherwise, the receiver RBridge will discard this frame. But active-active multihoming violates this rule which causes the RPFC issue. This issue is pointed out in [CMT] and [MTRA] and solutions for this issue will accordingly be developed in those documents. The frame processing procedures are carefully designed in this document to avoid traffic duplication and forwarding loops. MAC addresses learned by any of the aggregated members MUST be immediately synchronized among all members. Simple configuration at the RBridge port and access switch port is required to realize RBridge Aggregation. With RBridge aggregation, a TRILL link can achieve reliable active/active multi-homing to a TRILL campus, which realizes fast failure recovery and failover. Traffic load is balanced in a finer granularity: the traffic load for a specific VLAN can freely go through any of the aggregated members. Mingui Zhang Expires November 3, 2012 [Page 3] INTERNET-DRAFT RBridge Aggregation May 2, 2012 Familiarity with [RFC6325], [RFC6327], and [RFC6349] is assumed in this document. As in [RFC6325], in this document the word "link" means a "bridged LAN", unless otherwise qualified. 1.1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Aggregation For loop avoidance, there can ONLY be a single appointed forwarder ingressing and egressing native frames on a link for a specific VLAN- x at the same time [RFC6349]. This single forwarder mechanism does not take the full advantage of the "multiple attachment" character of TRILL links. It does not make full use of available access bandwidth and reduces the network resilience. Take Figure 2.1 as an example, although both RB1 and RB2 have the ability to perform frame forwarding for VLAN-x, DRB can only appoint one of them to be the appointed forwarder, the other one will be inhabited from ingressing and egressing frames of VLAN-x. Mingui Zhang Expires November 3, 2012 [Page 4] INTERNET-DRAFT RBridge Aggregation May 2, 2012 +-----+ +-----+ | RBi | | RBi | +-----+ +-----+ | | /\/\/\/\/\/\ /\/\/\/\/\/\ / Transit \ / Transit \ < RBridges > < RBridges > \ Campus / \ Campus / \/\/\/\/\/\/ --> \/\/\/\/\/\/ | | | | +-----+ +-----+ +-----+ +-----+ | RB1 |--| RB2 | | RB1 |--| RB2 | +-----+ +-----+ +-----+ +-----+ \ / \ / +----+ ******* +-| B1 |-+ * RBv * | +----+ | ******* | | || |[H] [H]| +----+ +--------+ +-| B1 |-+ VLAN-x | +----+ | | | |[H] [H]| +--------+ VLAN-x Figure 2.1: Illustration of RBridge Aggregation The RBridges can be aggregated to break the above limitations. Figure 2.1 illustrates RBridge Aggregation. RB1 and RB2 are both attached to the local link which carries VLAN-x. We assume that there is a virtual RBridge acting as VLAN-x's forwarder and using the nickname RBv. When RB1 or RB2 ingresses frames from the local link to TRILL networks, they will use RBv as the ingress nickname. The two virtual links between RB1, RB2 and RBv will be announced in LSPs. Other RBridges will believe there is an RBridge node RBv connecting RB1, RB2 and the local link. When packets are sent to the local link, RBv will serve as the egress RBridge (i.e., the last hop) while RB1 or RB2 will serve as the penultimate hop. Note that although the examples used to illustrate RBridge Aggregation in this document include two edge RBridges, the RBridge Aggregation solution supports more than two aggregation members. The frame processing will be defined in Section 3. To ease the implementation of RBridge aggregation, limited changes are introduced to the aggregated RBridge members while no new feature is added to the access bridge B1 and other RBridges in the campus. Mingui Zhang Expires November 3, 2012 [Page 5] INTERNET-DRAFT RBridge Aggregation May 2, 2012 3. Frame Processing RBridge Aggregation introduces two forwarders for the same TRILL link. If things do not change, it is possible to cause two problems: 1. Traffic duplication - The members of the aggregated RBridges ingress or egress the same frame at the same time for the local link. Then end stations may receive duplicated frames. 2. Forwarding loops - Take Figure 2.1 as an example, RB1 sends multicast frames which will reach RB2 who will egress the multicast frame back to the local link which cause a forwarding loop. Here, the RBridge Aggregation can be looked as a shortcut between the leaf nodes of a spanning tree. This problem is called "flooding rebirth". The forwarding loop caused by flooding rebirth can further cause harmful broadcast storming to the local link. Frame processing is carefully designed in the following subsections to eliminate the above problems. Although all the aggregated RBridges have the right to deliver the frames for the local link at the same time, it's still necessary to determine a single responsible appointed forwarder for a specific frame. 3.1. Unicast Ingressing When a unicast frame is received from the local link by one of the aggregated RBridges, this ingress RBridge fills RBv into the TRILL header of the frame as the ingress nickname and then sends it to its corresponding egress RBridge as a normal unicast frame. There is no problem until we consider unknown unicast from the local link. When the access bridge receives a frame destined to a MAC address not in the address table, it will flood this frame to all other ports. The aggregation members will all receive this unicast frame. Nevertheless, the members do not know that this unicast frame is flooded to them. If the aggregated RBridges right have the destination MAC address in their address table. This frame will simply be sent as a known unicast by all the aggregated RBridges so that the remote egress RBridge will receive duplicated frames. Our solution is to configure the access links as "link aggregation" [802-1AX] at the side of the bridge (see Section 7). We can also use unknown unicast blocking technique to solve this problem: Within the access links to the aggregated RBridges, one and only one is picked out to let through unknown unicast while all the other ports suppress the egress of unknown unicast frames. Since only one aggregated RBridge will receive this unknown unicast frame, traffic duplication is avoided. Mingui Zhang Expires November 3, 2012 [Page 6] INTERNET-DRAFT RBridge Aggregation May 2, 2012 3.2. Unicast Egressing When an aggregated RBridge member receive a unicast frame whose egress nickname is the nickname of the virtual RBridge of the aggregated members, this RBridge will decapsulate the frame and egress it to the local link. 3.3. Multicast Ingressing Similar as unicast ingressing, the ingress nickname of the frames is set to RBv. In order to avoid duplicated multicast frames, multicast ingress frames can ONLY be forwarded by one of the RBridges. To achieve this, the aggregated RBridges forward multicast frames based on its locally implemented hashing function. As an example, the last bit of the source MAC address are used as the input of the hashing function. Frames with the source MAC address whose last bit is 0 will be forwarded by RB1 while RB2 will simply discard such frames. Frames with the source MAC address whose last bit is 1 will be forwarded by RB2 while RB1 will discard such frames. To realize fine grained load balance, more bits can be used by the hashing function of aggregated RBridges, which can be manually configured. 3.4. Multicast Egressing It is probably that both the aggregated RBridges will receive the multicast frames destined to the local link. However, only one of them will act as the forwarder of these frames according to their local hashing. Again, as an example, the last bit of the source MAC address of the multicast frames are used to break the tie: RB1 only forwards frames with the source MAC address suffixed by 0 while RB2 only forwards frames with the source MAC address suffixed by 1. When a multicast frame originated by the local link is forwarded across the TRILL network and received by the peer RBridge, it is important that the peer RBridge does not egress this frame back to the local link, otherwise it will cause a forwarding loop to the local link (flooding rebirth). The above hashing function will be used by the peer RBridge who will determine not to forward this multicast frame. In order to keep consistence to the hashing result of the ingress RBridge, bits that are possible to be changed with the frame forwarding should not be used in hashing, such as bit from the hop count field. 4. Address Synchronization MAC addresses SHOULD be synchronized between the aggregated members through ESADI immediately after they are learned from the data plane frame processing. A MAC address sent through ESADI message from the Mingui Zhang Expires November 3, 2012 [Page 7] INTERNET-DRAFT RBridge Aggregation May 2, 2012 peer is stored in the MAC table as if it is locally learned. Afterwards, a frame destined to this MAC address can be delivered to the local link or TRILL network by either of the aggregated members. In a corner case that a unicast frame are received by a aggregated member in the flight of ESADI message and the destination MAC address has not learned from its peer, this frame will be sent as an unknown unicast by this member. 5. Configuration of RBridge Aggregation RBridge Aggregation should be configured by network managers when they configure the RBridge ports. Only the RBridge ports connected to the same bridge can be configured to be aggregated and all VLANs carried on this TRILL link will be treated as aggregated. The pseudonode nickname is used as the nickname of the aggregated virtual RBridges. If the TRILL link do not have pseudonode nickname, the nickname for the virtual RBridge is required to be manually configured and used by all the aggregated members. The members of an aggregated group should report connections to the aggregated VLANs so that the multicast traffic of these VLANs will reach all the members. In [RFC6325], in order to suppress loops, multiple appointed forwarders for the same VLAN on a same local link is prohibited. This limitation should be relaxed in the RBridge Aggregation solution. 5.1. Hashing Function Determination Hashing function is well supported by hardware. Network manager should determine the TRILL data frame fields that are used as the hashing input. It is important that all aggregated members get consistent output on the same native data frame. Therefore the fields that are to be changed during frame processing MUST not be used as the hashing input. Source, Destination MAC address and inner VLAN ID are all candidates for this kind of hashing input. Each aggregated member maintain a circular list of the aggregated members. Assume the hashing function is H(T) and there are "A" members in the Aggregated RBridges group. The responsible forwarder is chosen as RBr = H(T) mod A for multicast and broadcast packets. 6. Resilience RBridge Aggregation offers active/active multi-homing to a multi- access TRILL link, which increases its reliability. In the event of access link failures, the TRILL link need not wait for the time- Mingui Zhang Expires November 3, 2012 [Page 8] INTERNET-DRAFT RBridge Aggregation May 2, 2012 consuming forwarder re-appointment to recover the connectivity to TRILL campus. 6.1. Failure Recovery Without RBridge aggregation, if a local link is disconnected from its Appointed Forwarder, the data forwarding can be restored after the DRB successfully choose a new appointed forwarder for this link. However, it may take a longer time before the new appointed forwarder begins to function properly. Until the new Appointed Forwarder properly functions, the disruption continues. In RBridge aggregation, if a aggregated member is not connected to the local link any more, it will send out an LSP to announce that it is not connected to the virtual RBridge RBv. Since all aggregated RBridges had reported the connection to RBv, remote RBridges in the TRILL campus can send frames to RBv via any other aggregated RBridges where the frames will be egressed to the local link. The connection to the local link remains uninterrupted. For ingressing unicast frames, if the link between the access bridge and aggregated RBridges fails, the access bridge will send these frames to the other RBridge where they will be delivered directly without disruption. Take Figure 2 as an example, suppose link B1-RB1 fails, the packets originally sent through link B1-RB1 will be sent as unknown unicast to all the interfaces of B1. Since RB2 stores all VLAN-x's addresses learned by RB1. The packets going through link B1- RB2 will be regarded as known unicast by RB2 and forwarded to its destination. +-+ +-->0->RB1<-+ +-+ +-->0->RB1<-+ | | | 1->RB2 | | | | 1->RB2 | |H| | 2->RB3 | |H| | 2->RB3 | |A|->| ...->RB...| --> |A|->| ...->RB...| |S| | k-1->RBk | |S| | k-1->RBk+1| |H| | k->RBk+1| |H| | k->RBk+1| | | | ...->RB...| | | | ...->RB...| +-+ +-n-1->RBn--+ +-+ +-n-1->RBn--+ Figure 6.1: Hashing function change during a link failure In normal case, in order to avoid duplicate frames and forwarding loops, an aggregated member will not send multicast frames that should not be sent by it according to the hashing function. However, when an aggregation member cannot forward these frames due to link failures, the next aggregated member on the aggregation list should take over the responsibility to deliver these multicast frames. This can be realized through the local change of its hashing function. The Mingui Zhang Expires November 3, 2012 [Page 9] INTERNET-DRAFT RBridge Aggregation May 2, 2012 new hashing function is changed in this way: originally, the member will only deliver the frames with the output of hashing function pointed to itself. This change is shown in Figure 6.1. When this member (RBk+1) knows that a member (RBk) is failed, it will take over the responsibility to deliver frames that are originally delivered by the failed member. Take Figure 2 for an example, in normal case, RB2 only deliver packets with source MAC addresses suffixed by 1. When link RB1-B1 fails and RB1 can not deliver the multicast frames from/to the local link, RB2 will take the responsibility to deliver packets with source MAC addresses suffixed by either 0 or 1. If the failed link is the link that let unknown unicast through. The access bridge should change the link connected to the next aggregated member to let through unknown unicast. This mechanism can be implemented through configuration of the ACL of the access bridge. 6.2. Failover When an aggregator detects that it is disconnected from the local link in the flight of data frames, it can transmit the frames to the other aggregator for delivery. In this way, the links connected to the aggregated RBridges are protected by each other. Unicast frames will be redirected directly during the failover. For a multi- destination frame or unknown unicast frame that should be delivered by one of the aggregated RBridges according to the hashing function, this RBridge can send the frame to the other RBridge through a reserved outer VLAN. The other RBridge will deliver multi- destinations frames from this reserved VLAN without considering the hashing function. 6.3. Connectivity of Wiring Close Topology According to the solution defined in Section A.3.3 of [RFC6325], the edge RBridges connected to a wiring close topology act as the roots of spanning trees at the same time. The TRILL link will be partitioned into several spanning trees. With RBridge Aggregation, the access bridge will treat the aggregated members as leaf nodes of the spanning tree. The edge RBridges do not have to emit BPDUs and participate the Spanning Tree Protocol any more. Possible forwarding loops are broken at the aggregated RBridges and the bridged TRILL need not to be partitioned, which defines a clearer boundary between the TRILL campus and the traditional bridged LAN. 7. Load Balance When a TRILL link is attached to aggregated RBridges, its packets can Mingui Zhang Expires November 3, 2012 [Page 10] INTERNET-DRAFT RBridge Aggregation May 2, 2012 be forwarded by each of these RBridges. The access switch can configure access links as "link aggregation", then it can balance the load among these links through link aggregation technique [802-1AX]. However, the access switch can well configure these link as normal links. That is not to say the traffic are not balanced in this case. Actually, the load will be balanced in the manner of multipathing (ECMP and Mutli-Topology Routing). Take Figure 2.1 as an example, bridge B1 is attached to RB1 and RB2 through link B1-RB1 and link B1- RB2. Suppose host Ha is attached to RBridge RBi and it is sending packets to a host located in the local link. If the remote RBridge RBi selects RB1 as the egress RBridge, then B1 will learn the source MAC address at the port attached to link B1-RB1. Therefore the packets destined to Ha from the local link will naturally be sent via link B1-RB1. Otherwise, if RB2 is selected as the egress RBridge, the packets will be sent through link B1-RB2. 8. Security Considerations This document raises no new security issues for IS-IS. 9. IANA Considerations This document requires no IANA actions. RFC Editor: please remove this section before publication. 10. References 10.1. Normative References [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol Specification", RFC 6325, July 2011. [CMT] T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast Trees (CMT)for TRILL", draft-tissa-trill-cmt-00.txt, working in progress. [MTRA] M. Zhang, "Reverse Path Forwarding Check under Multiple Topology TRILL", draft-zhang-trill-multi-topo-rpfc-00.txt, working in progress. [RFC6349] R. Perlman, D. Eastlake, et al, "RBridges: Appointed Forwarders", RFC 6349, November 2011. 10.2. Informative References [802-1AX] "IEEE Standard for Local and metropolitan area networks - Link Aggregation", IEEE Std 802.1 AX-2008, 3 November Mingui Zhang Expires November 3, 2012 [Page 11] INTERNET-DRAFT RBridge Aggregation May 2, 2012 2008. Mingui Zhang Expires November 3, 2012 [Page 12] INTERNET-DRAFT RBridge Aggregation May 2, 2012 Author's Addresses Mingui Zhang Huawei Technologies Co.,Ltd Huawei Building, No.156 Beiqing Rd. Z-park ,Shi-Chuang-Ke-Ji-Shi-Fan-Yuan,Hai-Dian District, Beijing 100095 P.R. China Email: zhangmingui@huawei.com Donald E. Eastlake, 3rd Huawei Technologies 155 Beaver Street Milford, MA 01757 USA Phone: +1-508-333-2270 EMail: d3e3e3@gmail.com Mingui Zhang Expires November 3, 2012 [Page 13]