Network Working Group L. Yong Internet Draft W. Hao D. Eastlake Category: Standard Track Huawei Expires: January 2014 July 8, 2013 ISIS Protocol Extension For Building Distribution Trees draft-yong-isis-ext-4-distribution-tree-00 Abstract This document proposes an IS-IS protocol extension for automatically building bi-directional distribution trees to transport multi- destination traffic in an IP network. Status of this document This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 8, 2014. Yong, et al [Page 1] Internet-Draft ISIS Ext. For Distribution Tree July 2013 Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction...................................................3 1.1. Conventions used in this document.........................4 2. IS-IS Protocol Extension.......................................4 2.1. RTADDR sub-TLV............................................4 2.2. RTADDRV6 sub-TLV..........................................6 2.3. The Group Address Sub-TLV.................................7 3. Procedures.....................................................8 3.1. Distribution Tree Computation.............................8 3.2. Parent Selection..........................................8 3.3. Parallel Local Link Selection.............................9 3.4. Tree Selection for a Group...............................10 3.5. Pruning a Distribution Tree for a Group..................10 3.6. RPF Mechanism............................................10 3.7. Forwarding Using a Pruned Distribution Tree..............10 3.8. Local Forwarding at Edge Router..........................11 3.9. Distribution Tree across different IGP Levels............12 4. Backward Compatibility........................................12 5. Security Considerations.......................................12 6. IANA Considerations...........................................12 7. Acknowledgements..............................................12 8. References....................................................12 8.1. Normative References.....................................12 8.2. Informative References...................................13 Yong, et al. [Page 2] Internet-Draft ISIS Ext. For Distribution Tree July 2013 1. Introduction The computer virtualization and cloud applications motivate the DC network virtualization technology [NVO3FRWK]. This technology decouples the end-points networking from the DC physical infrastructure network in terms of address space and configuration [NVO3FRWK]. DC network virtualization solutions are necessary to carry all types of traffic in today's DC physical networks including multi- destination traffic. It is also desirable to use IP network as the DC underlying network for the overlay virtual networks [NVO3FRWK]. IP network technology does not yet support multi-destination traffic forwarding. A variant of Protocol Independent Multicast (PIM) solutions [RFC4601] [RFC5015] are designed to carry IP multicast traffic over IP networks. However the PIM solutions use their own hello protocol and hop-to-hop Join/Leave message so each router does not have global information about the receivers; in the PIM solution, the data packets could be forwarded unnecessarily to the Rendezvous Point(RP), and then get dropped there when no receiver at all or the sender and receivers for a multicast group are on the same branch towards the RP, which consumes network resources. Furthermore PIM solutions maintain a lot of soft-state, have intensive CPU utilization, and have additional convergence time besides IGP's under a failure condition. Although the PIM protocol is mature and has been deployed in IP networks, applying PIM to the IP network that supports the Network Virtualization can be an extreme challenge [MCASTISS]. For example, VXLAN [VXLAN] solutions requires multicast support in the underlying network to simulate overlay L2 broadcast capability, where every edge node in an overlay virtual network (VN) is a multicast source and receiver. An overlay VN topology may be sparse and dynamic compared to the underlying IP network topology. Also large number of overlay VNs may exist in a DC, which PIM solutions can't scale to. This document uses extensions to the IS-IS protocol to build a distribution tree for multi-destination traffic transport in an IP network. A router uses Router Capability message to announce the tree root address and the multicast groups associated to the tree. With this information, routers in the IGP can compute rooted distribution trees by using the link state information, i.e. LSDB, and shortest path algorithm. Edge routers include information in their LSPs to announce their multicast group-memberships. Routers perform distribution tree pruning for each multicast group based on Yong, et al. [Page 3] Internet-Draft ISIS Ext. For Distribution Tree July 2013 router's group membership announcement. A router forwards the multi- destination traffic along the pruned tree. In this solution, edge routers use IGMP query messages to inform the attached hosts and the hosts use IGMP report message to response with their interested multicast group(s). The edge routers announce interested multicast groups in their LSPs so they are flooded to whole network. The benefits of this solution are 1) protocol convergence: use single protocol for both unicast and multicast traffic transport and get the same convergence time for unicast and multicast traffic. 2) multi-destination transport simplification: rely on the LSDB for computing a distribution tree and not run PIM hello protocol. 3) forwarding efficiency: no need to always forward the traffic to the RP; 4) better scalability: no need to maintain heavy PIM soft states. TRILL [RFC6325] has used IS-IS protocol for both single destination and multi-destination packet transport, which proves the protocol capability for doing both. 1.1. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. 2. IS-IS Protocol Extension 2.1. RTADDR sub-TLV This is the sub-TLV of Router Capability TLV. Each RTADDR sub-TLV contains a root IPv4 address and multicast group addresses that associate to the tree. A router may use multiple RTADDR sub-TLVs to announce multiple root addresses and associated multicast groups with each root. RTADDR sub-TLV format is below. Yong, et al. [Page 4] Internet-Draft ISIS Ext. For Distribution Tree July 2013 +-+-+-+-+-+-+-+-+ |Type=RTADDR | (1 byte) +-+-+-+-+-+-+-+-+ | Length | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Root IPv4 Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESV | Topology ID | (2 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tree Priority | (1 byte) +-+-+-+-+-+-+-+-+ |Num of Groups | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address (1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Mask (1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | GROUP Address (N) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Mask (N) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Where: Type: sub-TLV of Router Capability for RTADDR (TBD) Length: variable depending on the number of associated groups Topology ID: This field carries a topology ID [RFC5120] or zero if topologies are not in use. Root IP Address: IPv4 Address for a root Tree Priority: high number means higher priority. Zero means no priority. Num of Groups: the number of group addresses Group Address: IPv4 Address for the group Group Mask: multicast group range Yong, et al. [Page 5] Internet-Draft ISIS Ext. For Distribution Tree July 2013 One router may be the root for multiple trees, each tree associates to a set of multicast groups. In this case, a router encodes multiple RTADDR sub-TLVs to announce root addresses, one for each root, in a router capability TLV. The group address/mask in different sub-TLVs can overlap. See section 3 for detail. 2.2. RTADDRV6 sub-TLV This sub-TLV is used in IPv6 network. It has the same format and usage except that the addresses are in IPv6. Yong, et al. [Page 6] Internet-Draft ISIS Ext. For Distribution Tree July 2013 +-+-+-+-+-+-+-+-+ |Type = RTADDRV6| (1 byte) +-+-+-+-+-+-+-+-+ | Length | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Root IPv6 Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESV | Topology ID | (2 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tree Priority | (1 byte) +-+-+-+-+-+-+-+-+ |Num of Groups | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Group IPv6 Address (1) + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + MASK(1) + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2.3. The Group Address Sub-TLV The Group Address TLV and a set of Group Address sub-TLVs are defined in RFC6326-bis [RFC6326BIS]. The GIP-ADDR and GIPV6-ADDR sub-TLVs are used in this solution. An edge router uses the GIP-ADDR sub-TLV or GIPV6-ADDR to announce its interested multicast groups. Yong, et al. [Page 7] Internet-Draft ISIS Ext. For Distribution Tree July 2013 The GIP-ADDR sub-TLV applies to an IPv4 network and GIPV6-ADDR sub- TLV for IPv6 network. When using a GIP-ADDR or GIPV6-ADDR sub-TLV, the field VLAN-ID MUST set to zero and be ignored. Other field usage remains the same as [RFC6326-BIS] 3. Procedures When an operator selects a router as a distribution tree root, he/she configures the tree root address and associated multicast groups on the router. A tree root address can be an interface address or router loopback address. After the configuration, the router will include a RTADDR sub-TLV, inside a router capability TLV, where the tree root address and multicast groups are specified. If multiple trees are configured on the router, multiple RTADDR sub- TLVs are added in one router capability TLV to specify individual tree roots. For IPv4 network, RTADDR sub-TLV is used. For IPv6, RTADDRV6 sub-TLV is used. Note that the rest of document specifies the processes for an IPv4 network only and the processes for an IPv6 network is the same. Operator may associate one multicast group to more than one tree for the redundancy purpose and use the tree priority to specify the primary tree preference. Section 3.2 describes the primary tree selection. 3.1. Distribution Tree Computation Upon receiving RTADDR sub-TLVs, routers track the tree roots and associated multicast groups. When the LSDB stabilizes, routers calculate all rooted trees according to the LSDB and shortest path algorithm. One multicast group may associate to multiple trees. It is important that all the routers choose the same tree for a multicast group. Section 3.2 and 3.3 describes the tiebreaking rule for primary tree selection for a multicast group and parent selection in case of equal-cost to potential children. 3.2. Parent Selection It is important, when building a distribution tree, that all routers choose the same links for the tree. Therefore, when there are equal costs from a potential child node to possible parent nodes, all routers need to use the same tiebreakers. It is also desirable to Yong, et al. [Page 8] Internet-Draft ISIS Ext. For Distribution Tree July 2013 allow splitting of traffic on as many links as possible in such situations. TRILL [RFC6325] achieves this by defining multiple rooted trees and using the tiebreakers to enable these trees to choose different parents. This draft uses the same tiebreakers as TRILL [RFC6325]. If there are k distribution trees in the network, when each router computes these trees, the k trees calculated are ordered and numbered from 0 to k-1 in ascending order according to root IP addresses. The tiebreaker rule is: When building the tree number j, remember all possible equal cost parents for router N. After calculating the entire "tree" (actually, directed graph), for each router N, if N has "p" parents, then order the parents in ascending order according to the 7-octet IS-IS ID considered as an unsigned integer, and number them starting at zero. For tree j, choose N's parent as choice j mod p. 3.3. Parallel Local Link Selection If there are parallel links between two routers, say R1 and R2, these parallel links would be visible to R1 and R2, but not to other routers. If this bundle of parallel links is included in a tree, it is important for R1 and R2 to decide which link to use; if the R1-R2 link is the branch for multiple trees, it is desirable to split traffic over as many link as possible. However the local link selection for a tree irrelevant to other Routers. Therefore, the tiebreaking algorithm need not be visible to any Routers other than R1 and R2. When there are L parallel links between R1 and R2 and they both are on K trees. L links are ordered from 0 to L-1 in ascending order of C i r c u i t D I Circuit ID as associated with the adjacency by the router with the highest System ID, and K trees are ordered from 0 to K-1 in ascending order of root IP addresses. The tiebreaker rule is: for tree k, select the link as choice k mod L. Note that if multiple distribution trees are configured in a network or on a router, better load balance among parallel links through the tie-breaking algorithm can be achieved. Otherwise, if there is only one tree is configured, then only one link in parallel links can be used for the corresponding distribution tree. However, calculating and maintaining many trees is resource consuming. Operators need to balance between two. Yong, et al. [Page 9] Internet-Draft ISIS Ext. For Distribution Tree July 2013 3.4. Tree Selection for a Group Routers receive one or more possible multicast group-range-to-tree mappings. Each mapping specifies a range of multicast groups. It is possible that a group-range is associated with multiple trees that may have the same or different priority. When a multicast group- range associates with more than one tree, all routers has to select the same tree for the group-range. The tiebreaker rules specified in PIM [RFC4601] are used. They are: o Perform longest match on group-range to get a list of trees. o Select the tree with highest priority. o If only one tree with the highest priority, select the tree for the group-range. o If multiple trees are with the highest priority, use the PIM hash function to choose one. PIM hash function is described in section 4.1.1 in RFC4601 [RFC4601]. 3.5. Pruning a Distribution Tree for a Group Routers prune the distribution tree for each associated multicast group, i.e. eliminating branches that have no potential downstream receivers. Multi-destination packets SHOULD only be forwarded on branches that are not pruned. The assumption here is that a multicast source is also a multicast receiver but a multicast receiver may not be a multicast source. Routers prune the trees based on the groups specified in GRADD-TLV from edge routers. Routers maintain a list of adjacency interfaces that are on the pruned tree for a multicast group. Among these interfaces, one interface may be toward the tree-root router and other are toward the egress routers. 3.6. RPF Mechanism For the further study. 3.7. Forwarding Using a Pruned Distribution Tree Forwarding a multi-destination packet follows the pruned tree for the group that the packet belongs to. It is done as follows. Yong, et al. [Page 10] Internet-Draft ISIS Ext. For Distribution Tree July 2013 o The router receives a multi-destination packet with group IP address that does not associated with any tree, the packet MUST be dropped. o Else check if the link that the packet arrives on is one of the ports in the pruned distribution tree. If not, the packet MUST be dropped. o Else perform RPF checking (section 3.5). If it fails, the packet SHOULD be dropped. o Else the packet is forwarded onto all the adjacency interfaces in the list for the group except the interface where the packet receive. 3.8. Local Forwarding at Edge Router Upon receiving a multi-destination packet, besides forwarding it along the pruned tree, an edge router may also need to forward the packet to the local hosts attached to it. This is referred to as local forwarding in this document. The local group database is needed to keep track of the group membership of the router's directly attached network or host. Each entry in the local group database is a [group, network/host] pair, which indicates that the attached network has one or more hosts belonging to the multicast group. When receiving a multi-destination packet, the edge router forwards the packet to the network/host that match the [group, network/host] pair in the local group database. The local group database is built through the operation of the IGMPv3 [RFC3376]. When an edge router becomes Designated Router on an attached network, say N1, it starts sending periodic IGMPv3 Host Membership Queries on the network. Hosts then respond with IGMPv3 Host Membership Reports, one for each multicast group to which they belong. Upon receiving a Host Membership Report for a multicast group A, the router updates its local group database by adding/refreshing the entry [Group A, N1]. If at a later time Reports for Group A cease to be heard on the network, the entry is then deleted from the local group database. The Designated Router further sends the LSP message with GRADDR sub-TLV to inform other routers about the group memberships in the local group database A router MUST ignore Host Membership Reports received on those networks where the router has not been elected Designated Router. Yong, et al. [Page 11] Internet-Draft ISIS Ext. For Distribution Tree July 2013 3.9. Distribution Tree across different IGP Levels Coming soon. 4. Backward Compatibility If a router does not support the distribution tree function described in this document, distribution tree computation MUST NOT include this router. This may result the incomplete tree. Operator can build a tunnel between two routers, which allows a single rooted tree to be built. How to build the tunnel is outside scope of this document. 5. Security Considerations Coming soon. 6. IANA Considerations The document requires two new sub-TLVs, RTADDR and RTADDRV6 for the Router Capability TLV in IANA registry. 7. Acknowledgements Authors like to thank Mike McBride and Linda Dunbar for their valuable inputs. 8. References 8.1. Normative References [RFC3376] Cain B., etc, ''Internet Group Management Protocol, Version 3'', rfc4604, October 2002 [RFC4601] Fenner, B., etc, ''Protocol Independent multicast - - Sparse Mode (PIM-SM): Protocol Specification'', rfc4601, August 2006 [RFC5015] Handley, M., etc, ''Bidirectional Protocol Independent Multicast (BIDIR-PIM'', rfc5015, October 2007 [RFC6325] Perlman, R., et al, ''Routing Bridges (RBridges): Base Protocol Specification'', RFC6325, July 2011 [RFC6326] Eastlake D, et al, '' Transparent Interconnection of Yong, et al. [Page 12] Internet-Draft ISIS Ext. For Distribution Tree July 2013 Lots of Links (TRILL) Use of IS-IS'', RFC6326, July 2011 8.2. Informative References [MCASTISS] Ghanvani, A., ''Multicast Issues in Networks Using NVO3'', draft-ghanwani-nvo3-mcast-issues-00, work in progress [NVO3FRWK] Lasserre, M., ''Framework for DC Network Virtualization'', draft-ietf-nvo3-framework-02.txt, work in progress. [RFC6326BIS] Eastlake, D., etc, ''Transparent Interconnection of Lots of Links (TRILL) Use of IS-IS'', draft-ietf-isis-rfc6326bis-01, work in progress Authors' Addresses Lucy Yong Huawei USA 5340 Legacy Drive Plano, TX 75025 USA Phone: 469-277-5837 Email: lucy.yong@huawei.com Weiguo Hao Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56623144 Email: haoweiguo@huawei.com Donald Eastlake Huawei 155 Beaver Street Milford, MA 01757 USA Phone: +1-508-333-2270 EMail: d3e3e3@gmail.com Yong, et al. [Page 13]