TRILL Working Group Y. Li Internet Draft W. Hao Intended status: Standards Track Huawei Technologies Expires: September 2012 S. Chatterjee IP Infusion March 5, 2012 VLAN based Tree Selection for Multi-destination Frames draft-yizhou-trill-tree-selection-00.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on August 5, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Li, et al. Expires September 5, 2012 [Page 1] Internet-Draft VLAN based Tree Selection March 2012 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract TRILL uses distribution trees for multi-destination traffic. In fat tree structure, it is very possible that the distance from an ingress RBridge to multiple tree roots are the same. Therefore the minimum distance comparison may not help much in the tree selection decision. Multiple trees can be used by an ingress RBridge. For any RBridge RBn, if RBn has downstream receivers of VLAN x in a distribution tree t, there will be an entry of (t, x, port list) in the multicast forwarding table on RBn. If there are n trees and m VLANs, the multicast forwarding table size on RBn is typically n*m entries. The value of m is up to 4096 and n is up to the total number of distribution trees in the campus. If finer granularity filtering such as L2/L3 multicast address is used, then the multicast forwarding table size further increases dramatically. TRILL multicast forwarding table size is limited in hardware/silicon implements and sometimes L3 multicasting shares the same table with it. This document specifies a VLAN based tree selection mechanism to reduce the multicast forwarding table size. No data plane change is required. Table of Contents 1. Introduction ................................................ 3 2. Conventions used in this document............................ 6 3. VLAN based Tree Selection.................................... 6 3.1. Overview ............................................... 6 3.2. Sub-TLVs for the Router Capability TLV ................. 7 3.2.1. The Tree Identifier and VLANs Sub-TLV ............. 7 3.2.2. The Tree Used Identifier and VLANs Sub-TLV......... 8 3.3. Detailed Processing..................................... 9 3.4. Failure Handling....................................... 10 3.5. Extensions ............................................ 10 4. Backward Compatibility...................................... 10 5. Security Considerations..................................... 12 6. IANA Considerations ........................................ 12 7. References ................................................. 12 7.1. Normative References................................... 12 7.2. Informative References ................................ 12 Li, et al. Expires September 5, 2012 [Page 2] Internet-Draft VLAN based Tree Selection March 2012 8. Acknowledgments ............................................ 13 1. Introduction One or more distribution trees can be used to distribute multi- destination frames in a TRILL campus. The RBridge having the highest tree root priority announces the total number of trees that are computed for the campus. It may also specify the ordered list of tree root nicknames that the other RBridges need to compute in the Tree Identifiers (TREE-RT-IDs) sub-TLV [RFC6326]. Every RBridge specifies the trees it wants to use in the Trees Used Identifiers (TREE-USE-IDs) sub-TLV and the VLAN it is interested in the Interested VLANs and Spanning Tree Roots (INT-VLAN) sub-TLV [RFC6326]. It is recommended that, by default, the ingress RBridge chooses the tree whose root is closest for multi-destination frames [RFC6325]. Trees Used Identifiers info is used to build the RPF table; Interested VLANs info is used for distribution tree pruning and the multicast forwarding table with pruned info is built based on that. Each distribution tree SHOULD be pruned per VLAN, eliminating branches that have no potential receivers downstream [RFC6325]. Further pruning based on L2/L3 multicast address is also possible. It is implementation dependant that how many trees to calculate, where the tree roots are located and which tree(s) to be used by an ingress RBridge. With the increasing demand to use TRILL in data center network, there are some features we can explore for multi- destination frames in the data center use case. In order to achieve non-blocking data forwarding, a fat tree structure is often used. Figure 1 shows a typical fat tree structure based data center network. RB1&RB2 are aggregation switches and RB11 to RB14 are access switches. It is a common practice to choose the tree roots to be at the aggregation switches for more efficient traffic transportation. All the ingress RBridges which are access switches have the same distance to all the tree roots. Li, et al. Expires September 5, 2012 [Page 3] Internet-Draft VLAN based Tree Selection March 2012 +-----+ +-----+ | RB1 | | RB2 | +-----+ +-----+ / | \\ / /|\ / | \ \ / / | \ / | \ / / | \-----+ / | \/ /\ | | / | /\/ \| | / /---+---/ /\ |\ | / / | / \ | \ | / / | / \ | \ | / / | / \ | \ | +-----+ +-----+ +-----+ +-----+ | RB11| | RB12| | RB13| | RB14| +-----+ +-----+ +-----+ +-----+ Figure 1 Fat Tree Structure based TRILL network In the structure of figure 1, if we choose to put the tree root at RB1 and RB2, the ingress RBridge (e.g. RB11) would find more than one closest tree root (i.e. RB1 & RB2). Then an ingress RBridge has two options to select: choose one and only one as distribution tree root or use ECMP-like algorithm to balance the traffic among the multiple trees whose roots are at the same distance. For the former, single used tree per ingress RBridge, has the obvious problem of inefficient link usage. For example, if RB11 chooses the tree1 which is rooted at RB1 as the distribution tree, the link between RB11 and RB2 will never be used to ingress the multi-destination frame by RB11. For the latter, ECMP based tree selection can have a linear increase in multicast forwarding table size with the number of trees as follows. In some implementations, a multicast forwarding table on an RBridge is used to map the key of (tree nickname + VLAN) to an index to a list of ports for multicast frame replication. The key used for mapping is simply the tree nickname when the RBridge does not prune the tree and the key could be (tree nickname + VLAN + L2/L3 multicast address) when the RBridge was programmed by control plane with L2/L3 multicast pruning information. For any RBridge RBn, for each VLAN x, if RBn is in a distribution tree t for VLAN x, there will be an entry of (t, x, port list) in the multicast forwarding table on RBn. If there are n such trees and m such VLANs, the multicast forwarding table size on RBn is n*m entries. When 4 distribution trees are used in a TRILL campus and RBn has 4K Li, et al. Expires September 5, 2012 [Page 4] Internet-Draft VLAN based Tree Selection March 2012 VLANs with downstream receivers, it consumes 16K table entries. Each entry contains a distinct combination of (tree nickname, VLAN) as the lookup key. Figure 2 left table shows an example of the multcast forwarding table with 2 distribution trees. TRILL multicast forwarding table has a limited size in hardware implementation. In some implementations, it shares with IP multicast for a total of 16K table entries. If fine-grained label is used [TrillFGL], the number of table entries will increase dramatically. A straightforward way to alleviate the limited table entries is not to prune the distribution tree. However it can only be used in the restricted scenarios for the following reasons, - Unnecessary bandwidth waste for multi-destination frame. There is broadcast traffic in each VLAN, like ARP and unknown unicast. In addition, if there is huge L3 multicast traffic in some VLAN, no pruning may result in worse consequence of L3 user data unnecessarily flooded. The volume could be huge if certain application like IPTV is supported. Finer pruning like pruning based on multicast group may be desirable in this case. - Only useful at the pure transit nodes. Edge nodes always need to maintain the multicast forwarding table with the key of (tree nickname + VLAN) since the edge node needs to decide whether to replicate the frame to local access port based on VLAN. It is very likely that edge nodes are relatively low scale switches with the smaller shared table size available. In addition to the multicast table size concern, some silicon does not support hashing based tree nickname selection at the ingress RBridge currently. VLAN based tree selection is used instead. Control plane of ingress RBridge maps the incoming VLAN x to a tree nickname t. Then data plane will always use tree t for VLAN x multi- destination frames. Though an ingress RB may choose multiple trees to be used for load sharing, it can use one and only one tree for single VLAN. This document describes the control plane support for VLAN based tree selection mechanism to reduce the multicast forwarding table size. It consists with the silicon implementation mentioned in the previous paragraph. Here VLAN based tree selection is a general term which also includes finer granularity case, e.g. VLAN + L2/L3 multicast group based selection. Li, et al. Expires September 5, 2012 [Page 5] Internet-Draft VLAN based Tree Selection March 2012 2. Conventions used in this document The same terminology and acronyms are used in this document as in [RF6325]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. 3. VLAN based Tree Selection VLAN based tree selection can be used as a complementary distribution tree selection mechanism, especially when the multicast forwarding table size is a concern. 3.1. Overview The tree root with the highest priority announces the tree nicknames and the VLANs allowed on each tree. Such tree-VLAN correspondence announcement can be based on static configurationor some predefined algorithm. Ingress RBridge selects the tree-VLAN correspondence it wishes to use from the list announced by the highest priority tree root. It should not transmit VLAN x frame on tree y if the highest priority tree root does not say VLAN x is allowed on tree y. If we make sure one VLAN is allowed on one and only one tree, we can keep the number of multicast forwarding table entries on any RBridge fixed at 4K maximum (or up to 16M in case of fine grained label). For example, there are two trees in the whole campus. The highest priority tree root appoints the tree1 to carry VLAN 1-2000 and tree2 to carry VLAN 2001-4095. With such announcement by the highest priority tree root, every RBridge which understands the announcement will not send the VLAN 2001-4095 on tree1 or send the VLAN 1-2000 on tree2. Then no RBridge would need to store the entries for tree1/VLAN2001-4095 or tree2/VLAN1-2000. Figure 2 shows the multicast forwarding table on an RBridge before and after we perform the VLAN based tree selection. The number of entries is reduced by a factor f , f being the number of trees used in the campus. In this example, it is reduced from 2*4095 to 4095. This affects both transit nodes and edge nodes. Data plane does not change. Li, et al. Expires September 5, 2012 [Page 6] Internet-Draft VLAN based Tree Selection March 2012 +--------------+-----+---------+ +--------------+-----+---------+ |tree nickname |VLAN |port list| |tree nickname |VLAN |port list| +--------------+-----+---------+ +--------------+-----+---------+ | tree 1 | 1 | | | tree 1 | 1 | | +--------------+-----+---------+ +--------------+-----+---------+ | tree 1 | 2 | | | tree 1 | 2 | | +--------------+-----+---------+ +--------------+-----+---------+ | tree 1 | ... | | | tree 1 | ... | | +--------------+-----+---------+ +--------------+-----+---------+ | tree 1 | 4094| | | tree 1 | 1999| | +--------------+-----+---------+ +--------------+-----+---------+ | tree 1 | 4095| | | tree 1 | 2000| | +--------------+-----+---------+ +--------------+-----+---------+ | tree 2 | 1 | | | tree 2 | 2001| | +--------------+-----+---------+ +--------------+-----+---------+ | tree 2 | 2 | | | tree 2 | 2002| | +--------------+-----+---------+ +--------------+-----+---------+ | tree 2 | ... | | | tree 2 | ... | | +--------------+-----+---------+ +--------------+-----+---------+ | tree 2 | 4094| | | tree 2 | 4094| | +--------------+-----+---------+ +--------------+-----+---------+ | tree 2 | 4095| | | tree 2 | 4095| | +--------------+-----+---------+ +--------------+-----+---------+ Figure 2 Multicast forwarding table before (left) & after (right) 3.2. Sub-TLVs for the Router Capability TLV Two new sub-TLVs that can be carried in the Router Capability TLV for TRILL are defined below. They can be considered as analog of finer granularity of the Tree Identifiers Sub-TLV and the Trees Used Identifiers Sub-TLV in [RFC6326]. 3.2.1. The Tree Identifier and VLANs Sub-TLV The tree identifiers and VLAN (TREE-VLANs) sub-TLV is used to announce the VLANs allowed on each tree by the IS that has the highest priority tree root. Multiple instances of this sub-TLV may be carried. Same tree nickname may occur in the multiple Tree-VLAN Li, et al. Expires September 5, 2012 [Page 7] Internet-Draft VLAN based Tree Selection March 2012 Records within the same or across multiple sub-TLVs. The sub-TLV format is as follows: +-+-+-+-+-+-+-+-+ | Type | (1 byte) +-+-+-+-+-+-+-+-+ | Length | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tree-VLAN Record (1) | (6 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ................. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tree-VLAN Record (N) | (6 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where each Tree-VLAN Record is of the form: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tree Nickname | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESV | Start.VLAN | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESV | End.VLAN | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o Type: Router Capability sub-TLV type, set to 20 (TREE-VLANs). o Length: 6*n bytes, where there are n Tree-VLAN Records. o Tree Nickname: The nickname at which a distribution tree is rooted. o RESV: 4 bits that MUST be sent as zero and ignored on receipt. o Start.VLAN, End.VLAN: These fields are the VLAN IDs of the allowed VLAN range on the tree, inclusive. To specify a single VLAN, the VLAN's ID appears as both the start and end VLAN. 3.2.2. The Tree Used Identifier and VLANs Sub-TLV This sub-TLV has the same structure as the Tree Identifiers and VLAN sub-TLV (TREE-VLANs) specified in Section 3.2.2. The only difference Li, et al. Expires September 5, 2012 [Page 8] Internet-Draft VLAN based Tree Selection March 2012 is that its sub-TLV type is set to 21 (TREE-VLAN-USE), and the Tree- VLAN record listed are those the originating IS allowes. 3.3. Detailed Processing The highest priority tree root includes all the necessary tree related sub-TLVs defined in [RFC6326] as usual and MAY optionally include the Tree Identifier and VLANs Sub-TLV (Tree-VLANs) in its LSP. The highest priority tree root may decide that each VLAN is only allowed on one and only one tree to maximize the saving in the multicast forwarding table size. Ingress RBridge that understands the Tree-VLANs Sub-TLV should select the tree-VLAN correspondences it wishes to use and put them in TREE- VLAN-USE sub-TLV. If there were multiple tree nicknames announced in Tree-VLANs Sub-TLV for a VLAN x, ingress RBridge must choose one of them. How to make such choice is out of the scope of this document. It may be desirable to have some fixed algorithm to make sure all ingress RBs choose the same tree for VLAN x in this case. Any single VLAN that the ingress RBridge is interested in should be related to one and only one tree ID in TREE-VLAN-USE to minimize the multicast forwarding table size on other RBridges. When ingress RBridge tries to encapsulate a multi-destination frame for VLAN x, it should use the tree nickname that it selected previously in TREE-VLAN-USE for VLAN x. If RBridge RBn does not perform pruning at all, it builds the multicast forwarding table exactly same as that in [RFC6325]. If RBn prunes the distribution tree based on VLANs, RBn uses the information received in TREE-VLAN-USE sub-TLV to mark the set of VLANs reachable downstream for each adjacency and for each tree it is in. Logically, ingress RBridge that does not support VLAN based tree selection is equivalent to the one that supports it and announces all the combination pair of tree-id-used and interested-vlan as TREE- VLAN-USE. RBn may additionally use a flag or special loopback port in a multicast forwarding table entry to indicate a multi-destination frame need to be decapsulated locally and replicate to access ports. Li, et al. Expires September 5, 2012 [Page 9] Internet-Draft VLAN based Tree Selection March 2012 3.4. Failure Handling Failure of a tree root: It is the responsibility of the highest priority tree root to inform others the change of the allowed tree- VLAN correspondence. When the highest priority tree root learns the root of tree t fails, it should re-assign the VLANs allowed on tree t to other trees. Failure of the highest priority tree root: It is recommended to configure the second highest priority tree root with the same knowledge of the tree-VLAN correspondence allowed as that in the highest priority tree root. Once the original highest priority tree root fails, the second highest priority tree root can take over the job. The original highest priority tree root normally is also served as a distribution tree root for a bunch of VLANs. Those VLANs should be re-assigned to other tree roots by the new highest priority tree root, especially when the failed node is the only tree that allows those VLANs. In some transient moment or misbehave of the highest priority tree root, the following errors may occur: - No tree has been announced to allow VLAN x frames - An ingress RBridge is supposed to transmit VLAN x frames on tree t, but root of tree t is no longer reachable. For second case, an ingress Bridge should choose another reachable tree root which allows VLAN x by the highest priority tree root announcement. If there is no such tree available, then it is same as the first case above. Then the ingress RBridge should be 'downgraded' to a conventional BRridge in [RFC6325]. 3.5. Extensions VLAN based tree selection can be easily extended to (VLAN+L2/L3 multicast group) based tree selection. For example, we can appoint multicast group 1 in VLAN 10 to tree1 and appoint group 2 in VLAN 10 to tree2 for better load sharing. New sub-TLVs can be specified later for this purpose. 4. Backward Compatibility RBridge MUST always include the TREE-USE-IDs and INT-VLAN sub-TLVs as usual no matter if it supports TREE-VLAN-USE sub-TLV. Li, et al. Expires September 5, 2012 [Page 10] Internet-Draft VLAN based Tree Selection March 2012 RBridge that understands TREE-VLAN-USE sub-TLV sent from another RBridge RBn should use it to build the multicast forwarding table and ignore the TREE-USE-IDs and INT-VLAN sub-TLVs sent from the same RBridge. It should be noted that TREE-USE-IDs and INT-VLAN sub-TLVs are still useful for other purposes, e.g. RPF table building, spanning tree root notification, etc. If the RBridge does not receive TREE-VLAN-USE sub-TLV from RBn, it uses the conventional way described in [RFC6325] to build the multicast forwarding table. For example, there are two distribution trees, tree1 & tree2 in the campus. RB1&RB2 are new RBridges which use the new sub-TLVs described in this document. RB3 is an old RBridge which is compatible with [RFC6325]. RB1 receives ((tree1, VLAN10),(tree2, VLAN11)) as TREE- VLAN-USE sub-TLV and (tree1, tree2) as TREE-USE-IDs sub-TLV from RB2 on port x. RB1 also receives (tree1) as TREE-USE-IDs sub-TLV and no TREE-VLAN-USE sub-TLV from RB3 on port y. Assume RB2 is interested in VLAN 10&11 and RB3 is interested in VLAN 100&101. RB2 & RB3 announce their interested VLANs in INT-VLAN sub-TLV as usual. Then RB1 will build the entry of (tree1, VLAN10, port x) and (tree2, VLAN11, port x) based on RB2's LSP and mechanism specified in this document. RB1 also builds entry of (tree1, VLAN100, port y), (tree1, VLAN101, port y), (tree2, VLAN100, port y), (tree2, VLAN101, port y) based on RB3's LSP in conventional way. The complete multicast forwarding table on RB1 would be like the following. +--------------+-----+---------+ |tree nickname |VLAN |port list| +--------------+-----+---------+ | tree 1 | 10 | x | +--------------+-----+---------+ | tree 1 | 100 | y | +--------------+-----+---------+ | tree 1 | 101 | y | +--------------+-----+---------+ | tree 2 | 11 | x | +--------------+-----+---------+ | tree 2 | 100 | y | +--------------+-----+---------+ | tree 2 | 101 | y | +--------------+-----+---------+ It is expected that the table is not shrunk as small as the one where every RB supports the new TREE-VLAN-USE sub-TLVs. The worst case in a hybrid campus is the number of entries equal to the number in current Li, et al. Expires September 5, 2012 [Page 11] Internet-Draft VLAN based Tree Selection March 2012 practice which does not support VLAN based tree selection. Such extreme case happens when the interested VLAN set from the new RBridges is a subset of the interested VLAN set from the old RBridges. VLAN based tree selection is compatibility with the current practice. Its effectiveness increases with more RBridge supporting this feature in the TRILL campus. 5. Security Considerations This document does not change the general RBridge security considerations of the TRILL base protocol. See Section 6 of [RFC6325]. 6. IANA Considerations IANA is requested to allocate the new sub-TLV type code as specified in Section 3. 7. References 7.1. Normative References [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. Ghanwani, "Routing Bridges (RBridges): Base Protocol Specification", RFC 6325, July 2011. [RFC6326] Eastlake, D., Banerjee, A., Dutt, D., Perlman, R., and A. Ghanwani, "TRILL Use of IS-IS", RFC 6326, July 2011. 7.2. Informative References [RFC6165] Banerjee, A. and D. Ward, "Extensions to IS-IS for Layer-2 Systems", RFC 6165, April 2011. [RFC6327] Eastlake 3rd, D., Perlman, R., Ghanwani, A., Dutt, D., and V. Manral, "Routing Bridges (RBridges): Adjacency", RFC 6327, July 2011 [TrillFGL] Eastlake 3rd, D., Zhang, M., Agarwal, P., Dutt, D., and Perlman, R., "TRILL: Fine-Grained Labeling", draft-ietf- trill-fine-labeling-00.txt, December 2011 Li, et al. Expires September 5, 2012 [Page 12] Internet-Draft VLAN based Tree Selection March 2012 8. Acknowledgments Authors wish to thank Radia Perlman, Donald Eastlake, Rakesh Kumar R, Ma Liangliang for the valuable comments. This document was prepared using 2-Word-v2.0.template.dot. Authors' Addresses Yizhou Li Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56624558 Email: liyizhou@huawei.com Weiguo Hao Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56623144 Email: haoweiguo@huawei.com Somnath Chatterjee IP Infusion, RMZ Centennial, Block D Doddanakundi Industrial Area, Kundanahalli Main Road,Mahadevapura Post, Bangalore - 560 048 Karnataka, India Email: somnath.chatterjee01@gmail.com Li, et al. Expires September 5, 2012 [Page 13]