Network Working Group IJsbrand Wijnands Internet Draft Arjen Boers Expiration Date: December 2004 Eric Rosen Cisco Systems, Inc. June 2004 The Proxy Field in PIM Join Messages draft-wijnands-pim-proxy-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes an extension to PIM which enables PIM to build multicast trees through an MPLS-enabled network, even if that network's IGP does not have a route to the source of the tree. Wijnands, et al. [Page 1] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 Table of Contents 1 Introduction ....................................... 2 2 Use of the Proxy Field in Join Messages ............ 4 2.1 Proxy and shared tree joins ........................ 4 2.2 Proxy Hello Option ................................. 5 2.3 The Vector Proxy ................................... 5 2.3.1 Inserting a Vector Proxy in a Join ................. 5 2.3.2 Processing a Received Vector Proxy ................. 5 2.3.3 Vector Proxy and Asserts ........................... 6 2.4 Other Proxy Types .................................. 6 2.4.1 Vector plus MDT-SAFI ............................... 6 2.4.2 Vector Stack ....................................... 7 2.5 Conflicting Proxies ................................ 7 2.6 Proxy Convergence .................................. 8 2.7 Multiple Proxies ................................... 8 3 PIM Join packet format ............................. 8 3.1 PIM Proxy Hello option ............................. 9 3.2 Vector Proxy TLV ................................... 9 3.3 MDT-SAFI Proxy TLV ................................. 10 3.4 Vector Stack Proxy TLV ............................. 10 4 Intellectual Property Statement .................... 11 5 Acknowledgments .................................... 12 6 Full Copyright Statement ........................... 12 7 Normative References ............................... 12 8 Informational References ........................... 12 9 Authors' Addresses ................................. 13 1. Introduction It is sometimes convenient to distinguish the routers of a particular network into two categories: "edge routers" and "core routers". The edge routers attach directly to users or to other networks, but the core routers attach only to other routers of the same network. If the network is MPLS-enabled, then any unicast packet which needs to travel outside the network can be "tunneled" via MPLS from one edge router to another. To handle a unicast packet which must travel outside the network, an edge router needs to know which of the other edge routers is the best exit point from the network for that packet's destination IP address. The core routers, however, do not need to have any knowledge of routes which lead outside the network; as they handle only tunneled packets, they only need to know how to reach the edge routers and the other core routers. Wijnands, et al. [Page 2] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 Consider, for example, the case where the network is an Autonomous System (AS), the edge routers are EBGP speakers, the core routers may be said to constitute a "BGP-free core". The edge routers must distribute BGP routes to each other, but not to the core routers. As another example, consider the case of an inter-AS MPLS/BGP IP VPN, as discussed in section 10 of [RFC2547bis]. Traffic may need to flow from a Provider Edge (PE) router in one AS to a PE router in another, but the core routers in the first AS are NOT required to have a route to the PE in the second AS. However, when multicast packets are considered, the strategy of keeping the core routers free of "external" routes is more problematic. When using PIM [PIMv2] to create a multicast distribution tree for a particular multicast group, one wants the core routers to be full participants in the PIM protocol, so that multicasting can be done efficiently in the core. This means that the core routers must be able to correctly process PIM Join messages for the group, which in turn means that the core routes must be able to send the Join messages towards the root of the distribution tree. If the root of the tree lies outside the network's borders (e.g., is in a different AS), and the core routers do not maintain routes to external destinations, then the PIM Join messages cannot be processed, and the multicast distribution tree cannot be created. In order to allow PIM to work properly in an environment where the core routers do not maintain external routes, a PIM extension is needed. When an edge router sends a PIM Join message into the core, it must include in that message a "Vector" which specifies the IP address of the next edge router along the path to the root of the multicast distribution tree. The core routers can then process the Join message by sending it towards the specified edge router (i.e., toward the Vector). In effect, the Vector serves as a proxy, within a particular network, for the root of the tree. This document defines a new field in the PIM Join message, called the "Proxy" field. A Proxy field can consist of a single Vector (e.g., IPv4 address) or a stack of Vectors (creating a form of source route). It can also consist of a Vector followed by an MDT-SAFI address [MDT-SAFI]; this is useful in supporting L3VPN multicast [VPN-MCAST]. Wijnands, et al. [Page 3] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 2. Use of the Proxy Field in Join Messages Before we can start forwarding multicast packets we need to build a forwarding tree by sending PIM Joins hop by hop. Each router in the path creates a forwarding state and propagates the Join towards the root of the forwarding tree. The building of this tree is receiver driven. See Figure 1. ------------------- BGP ------------------- | | [S]---( Edge 1)--(Core 1)---( Core )--(Core 2)---( Edge 2 )---[R] <--- (S,G) Join Figure 1. In this example, the 2 edge routers are BGP speakers. The core routers are not BGP speakers and do not have any BGP distributed routes. The route to S is a BGP distributed route, hence is known to the edge but not to the core. The Edge 2 router determines the interface leading to S, and sends a PIM Join to the upstream router. In this example, though, the upstream router is a core router, with no route to S. Without the PIM extensions specified in this document, the core router cannot determine where the send the Join, so the tree cannot be constructed. To allow the core router to participate in the construction of the tree, the Edge 2 router will include a Proxy field in the PIM Join. In this example, the Proxy field will contain the IP address of Edge 1. Edge 2 then forwards the PIM Join towards Edge 1. The intermediate core router do their RPF check on the Proxy (IP address of Edge 1) rather than the Source, this allows the tree to be constructed. 2.1. Proxy and shared tree joins In the example above we build a source tree to illustrate the proxy behavior. The proxy is however not restricted to source tree only. The tree may also be constructed towards a Rendezvous Point (RP) IP address. The RP IP address is used in a similar way as the Source in the example above. PIM Proxy procedures defined for sources are equally applicable to RPs unless otherwise noted. Wijnands, et al. [Page 4] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 2.2. Proxy Hello Option A new PIM source type has been defined to include the Proxy field. This source type is included in a normal PIM Join. Each router on a connected network needs to be able to understand and parse the Join message. Therefore we include a new PIM hello option to advertise our capability to parse and process the new source type. We can only send a PIM Join which includes a Proxy if ALL routers on the network support the new option. (Even a router which is not the upstream neighbor must be able parse the packet in order to do Join suppression or overriding.) Option value TBD. 2.3. The Vector Proxy 2.3.1. Inserting a Vector Proxy in a Join In the example of Figure 1, when the Edge 2 router looks up the route to the source of the multicast distribution tree, it will find a BGP-distributed route whose "BGP next-hop" is Edge 1. Edge 2 then looks up the route to Edge 1 to find interface and PIM adjacency which is the next hop to the source, namely Core 2. When Edge 2 sends a PIM Join to Core 2, it includes a Vector Proxy specifying the address of Edge 1. Core 2, and subsequent core routers, will forwarding the Join along the Vector (i.e, towards Edge 1) instead of trying to forward it towards S. Whether a Proxy is actually needed depends on whether the Core routers have a route to the source of the multicast tree. How the Edge router knows whether or not this is the case (and thus how the Edge router determines whether or not to insert a Proxy field) is outside the scope of this document. 2.3.2. Processing a Received Vector Proxy When processing a received PIM Join which contains a Vector Proxy, a router must first check to see if the Vector IP address is one of its own IP addresses. If so, the Vector Proxy is discarded, and not passed further upstream. Otherwise, the Vector Proxy is used to find the route to the source, and is passed along when a PIM Join is sent upstream. Note that a router which receives a Vector Proxy must use it, even if that router happens to have a route to the source. A router which discards a Vector Proxy may of course insert a new Vector Proxy. This would typically happen if a PIM Join needed to pass through a sequence of Edge routers, each pair of which is Wijnands, et al. [Page 5] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 separated by a core which does not have external routes. In the absence of periodic refreshment, Vectors expire along with the corresponding (S,G) state. 2.3.3. Vector Proxy and Asserts In a PIM Assert message we include the routing protocol's "metric" to the source of the tree. This information is used in the selection of the assert winner. If a PIM Join is being sent towards a Vector, rather than towards the source, the Assert message must have the metric to the Vector instead of the metric to the source. The Assert message however does not have a Proxy field and does not mention the Vector. A router may change its upstream neighbor on a particular multicast tree as the result of receiving Assert messages. However a Vector Proxy should not be sent in a PIM Join to an upstream neighbor which is chosen as the result of processing the Assert messages. Reachability of the Vector is only guaranteed by the router that advertises reachability to the Vector in it's IGP. If the assert winner upstream is not our real preferred next-hop, we can't be sure this router knows the path to the Vector. 2.4. Other Proxy Types 2.4.1. Vector plus MDT-SAFI This Proxy type is used in support of the multicast VPN service [VPN-MCAST]. Here the source of the multicast distribution tree is not an IPv4 address, but an MDT_SAFI [MDT-SAFI] address. Each edge router along the path to the source is expected to have a table of BGP-distributed MDT-SAFI addresses, but the core routers are not expected to have any MDT-SAFI addresses or to have routes to Edge routers that are in other networks. An Edge router creating a PIM Join would insert a "Vector plus MDT-SAFI" Proxy. The Vector identifies the next Edge router on that path to the source, and the MDT-SAFI identifies the source of the tree. When the Join reaches the Edge router identified by the Vector, that Edge router uses the MDT-SAFI to look up the route to the source in its BGP MDT-SAFI table. When the Join is sent upstream, it continues to carry the "Vector plus MDT-SAFI" Proxy, but with a new Vector value identifying the next Edge router in the path. Eventually, the Join must reach a router that is identified by both the Vector part and the MDT-SAFI part of the Proxy. When this Wijnands, et al. [Page 6] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 happens, the Proxy is discarded and further processing of the Join continues. (Typically this will be at the source of the tree.) Per [MDT-SAFI], the MDT-SAFI address consists of an RD, a multicast group address, and the IP address of the source. In the Proxy field we encode only the RD, as the other two components of the MDT-SAFI address can be gleaned from other parts of the Join. 2.4.2. Vector Stack A Vector Stack Proxy is a stack of Vectors used to build a forwarding tree that follows a set of routers identified by the Vectors. The Vectors in the stack define the path. How the Vectors are selected is out of the scope of this draft. Using the Vector stack we can build a traffic engineered path per (S,G). The rules that apply to a single Vector Proxy also apply to the first Vector on the stack. However, when the router identified by the first Vector is reached, it pops the stack before passing the Proxy upstream. We could get the same functionality by including multiple single Vectors in the PIM Join, we do however prefer to have a new TLV for this. We save the overhead of TLV type and length for multiple Vectors, and we also limit the Proxy count number in the PIM Join message since we don't have to count each Vector as a single Proxy. This way a maximum number of 31 proxies seems sufficient. 2.5. Conflicting Proxies It's possible that a router receives conflicting proxy information from different downstream routers. See Figure 2. ( Edge A1 ) ( Edge B1 )---- [R1] / \ / / \ / [S] ( Core ) \ / \ \ / \ ( Edge A2 ) ( Edge B2 )---- [R2] Figure 2 There are 2 receivers for the same group connected to Edge B1 and B2. Suppose that edge router B1 prefers A1 as the exit point and B2 prefers A2 as exit point to reach the source S. If both Edge B1 and Wijnands, et al. [Page 7] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 B2 send a Join including a Proxy to prefer their exit router in the network and they cross the same core router, the core router will get conflicting proxy information for the source. If this happens we use the Proxy from the PIM adjacency with the numerically smallest IP address. The Proxies from other sending routers may be kept around in case the best Proxy gets pruned or expires, we are able to immediately use the second best Proxy and converge quickly without waiting for the next periodic update. 2.6. Proxy Convergence A Proxy is included in a PIM Join message together with the source information. If the Proxy for this source is changed, we trigger a new PIM Join message to the upstream router. This causes the new Proxy to be propagated. This new Proxy implicitly removes the old Proxy upstream. If processing the new Proxy results in a change in the distribution tree, a PIM Prune message may be sent. This PIM Prune does not need to carry any Proxy, the sender of the prune and the source and group information is enough to identify the entry. The proxy information is removed immediately and possibly a new proxy is chosen from the database if available. 2.7. Multiple Proxies A PIM Join can contain multiple Proxies. The Proxies are encoded as TLVs associated with a new PIM source type in the PIM message. When a PIM Join with multiple Proxies is received, the first Proxy is processed, and the action taken depends upon the Proxy type. This may or may not result in the processing of the next Proxy. The set of Proxies is treated as a stack, much as described in section 3.3. Proxies not processed are passed upstream unchanged. 3. PIM Join packet format There is no space in the default PIM source encoding to include a Proxy field. Therefore we introduce a new source encoding type. The proxies are formatted as TLV's. The new Encoded source address looks like this: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Addr Family | Encoding Type | TLV # |S|W|R| Mask Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | Wijnands, et al. [Page 8] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Value +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+..... | Type | Length | Value +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+..... . . . . . . TLV # gives the number of TLV's that are included with this source. With the 5 bits we can include a maximum of 31 TLV's Type field of the TLV is 1 byte. Length field of the TLV is 1 byte. The other fields are the same as described in the [PIMv2] spec. The source TLV encoding type: TBD. 3.1. PIM Proxy Hello option 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OptionType = XX | OptionLength = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option type: TBD. 3.2. Vector Proxy TLV 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | IP address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....... Type ---- The Vector Proxy type is 0. Length ------ Wijnands, et al. [Page 9] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 Length in bytes is 4. Value ----- IPv4 address. 3.3. MDT-SAFI Proxy TLV 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | IP address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RD +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-....... Type ---- RD Proxy type is 1 Length ------ Length in bytes is 24 Value ----- IPv4 address and RD. 3.4. Vector Stack Proxy TLV 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Depth | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Vector 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Vector n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Wijnands, et al. [Page 10] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 Type ---- Vector Stack Proxy type is 2 Length ------ Length is (2 + Depth * Vector size) bytes Value ----- Depth is 1 byte, allows for 255 Vectors. Reserved is 1 byte, Vector is an IPv4 address, 4 bytes. 4. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Wijnands, et al. [Page 11] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 5. Acknowledgments The authors would like to thank Yakov Rekhter and Dino Farinacci for their initial ideas on this topic and Nidhi Bhaskar for her comments on the draft. 6. Full Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78 and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 7. Normative References [PIMv2] "Protocol Independent Multicast - Sparse Mode (PIM-SM)", Fenner, Handley, Holbrook, Kouvelas, December 2002, draft-ietf-pim- sm-v2-new-06.txt 8. Informational References [MDT-SAFI] "MDT SAFI", Nalawade and Sreekantiah, February 2004, draft-nalawade-idr-mdt-safi-00.txt [RFC2547bis] "BGP/MPLS IP VPNs", edited by Rosen and Rekhter, September 2003, draft-ietf-l3vpn-rfc2547bis-01.txt [VPN-MCAST] "Multicast in BGP/MPLS VPNs", Cai, Rosen, Wijnands, draft-rosen-vpn-mcast-07.txt, May 2004 Wijnands, et al. [Page 12] Internet Draft draft-wijnands-pim-proxy-00.txt June 2004 9. Authors' Addresses IJsbrand Wijnands Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 E-mail: ice@cisco.com Arjen Boers Cisco Systems, Inc. 170 Tasman Drive San Jose, CA, 95134 E-mail: aboers@cisco.com Eric Rosen Cisco Systems, Inc. 1414 Massachusetts Avenue Boxborough, MA, 01719 E-mail: erosen@cisco.com Wijnands, et al. [Page 13]