Internet DRAFT - draft-drao-l3vpn-virtual-network-overlay-multicast

draft-drao-l3vpn-virtual-network-overlay-multicast







l3vpn                                                             D. Rao
Internet-Draft                                                   V. Jain
Intended status: Standards Track                           Cisco Systems
Expires: January 17, 2014                                  July 16, 2013


                L3VPN Virtual Network Overlay Multicast
         draft-drao-l3vpn-virtual-network-overlay-multicast-00

Abstract

   Virtual network overlays are extremely useful for supporting
   multicast applications in multi-tenant data center networks, by
   distributing the per-tenant multicast state and forwarding actions at
   the network edges with minimal control plane load and simpler
   forwarding in the core.  A virtual overlay network may use existing
   encapsulations such as MPLS-in-GRE or newer IP based encapsulations
   such as VXLAN and NVGRE.

   IP multicast sources and receivers are very commonly spread out
   across multiple subnets.  Sources and receivers may also be spread
   within and outside a single network domain such as a data center.
   Hence, a Layer-3 multicast paradigm is the most suitable and
   efficient approach for delivery of IP multicast traffic.

   BGP based MVPNs provide a good basis for providing a solution to
   support IP multicast across these overlay networks.  An appropriate
   subset of the MVPN control plane and procedures are sufficient to
   support the requirements, providing for a simpler model of operation.
   This document describes the use of BGP based MVPNs alongwith the new
   IP-based virtual network overlay encapsulations to provide a Layer-3
   virtualization solution for IP multicast traffic, and specifies
   mechanisms to use the new encapsulations while continuing to leverage
   the BGP MVPN control plane techniques and extensions.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.







Rao & Jain              Expires January 17, 2014                [Page 1]

Internet-Draft                                                 July 2013


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 17, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
   3.  Solution Requirements . . . . . . . . . . . . . . . . . . . .   4
   4.  Network Topology  . . . . . . . . . . . . . . . . . . . . . .   4
   5.  Solution Scenarios  . . . . . . . . . . . . . . . . . . . . .   5
   6.  Fine-grained Transit Pruning  . . . . . . . . . . . . . . . .   6
   7.  Originating interest at a receiver edge device  . . . . . . .   6
   8.  Source mapping from a sender edge device  . . . . . . . . . .   7
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
   10. Security Considerations . . . . . . . . . . . . . . . . . . .   8
   11. Change Log  . . . . . . . . . . . . . . . . . . . . . . . . .   8
   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .   8
     12.1.  Normative References . . . . . . . . . . . . . . . . . .   8
     12.2.  Informative References . . . . . . . . . . . . . . . . .   8
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction










Rao & Jain              Expires January 17, 2014                [Page 2]

Internet-Draft                                                 July 2013


   Virtual network overlays are extremely useful for supporting
   multicast applications in multi-tenant data center networks, by
   distributing the per-tenant multicast state and forwarding actions at
   the network edges with minimal control plane load and simpler
   forwarding in the core.  A virtual overlay network may use existing
   encapsulations such as MPLS over GRE or newer IP based encapsulations
   such as VXLAN and NVGRE.

   IP multicast sources and receivers are very commonly spread out
   across multiple subnets.  Sources and receivers may also be spread
   within and outside a single network domain such as a data center.
   Hence, a Layer-3 multicast paradigm is the most suitable and
   efficient approach for delivery of multicast traffic.

   To send multicast data to multiple receivers across the overlay,
   packets are sent on a core tree (P-tunnel) that is typically realized
   using an IP multicast encapsulation such as the ones mentioned above.

   In order to reduce the number of multicast P-tunnels that need to be
   set up and maintained in the core network, aggregate core trees may
   be used, with multiple VPNs being supported over a given P-tunnel.
   The P-tunnel encapsulations such as MPLS-in-GRE, VXLAN and NVGRE
   support a VN-ID or VPN label in the encapsulation header which can be
   used to distinguish the VPN.

   BGP based MVPNs provide a good basis for providing a solution to
   support IP multicast across these overlay networks.  An appropriate
   subset of the MVPN control plane and procedures are sufficient to
   support the requirements, providing for a simpler model of operation.

   This document describes the use of BGP based MVPNs alongwith the IP-
   based virtual network overlay encapsulations to provide a Layer-3
   virtualization solution for IP multicast traffic, and specifies
   mechanisms to use the new encapsulations while continuing to leverage
   the BGP MVPN control plane techniques and extensions.

   This mechanism provides an efficient incremental solution to support
   forwarding for IP traffic, irrespective of whether it is destined
   within or across an IP subnet boundary.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to
   be interpreted as described in [RFC2119] only when they appear in all
   upper case.  They may also appear in lower or mixed case as English
   words, without any normative meaning.




Rao & Jain              Expires January 17, 2014                [Page 3]

Internet-Draft                                                 July 2013


3.  Solution Requirements

   1.   Support for large number of senders and receivers for a given
        group

   2.   Receivers and groups spread out across large number of PEs or
        edge nodes

   3.   Support for large number of C-multicast routes

   4.   Support for Multi-tenancy

   5.   Support for optimal multicast replication within and across
        subnets

   6.   Support for optimal pruning on transit nodes

   7.   Minimal control plane churn

   8.   Minimal state in core

   9.   Support for multiple overlay encapsulations

   10.  Support for redundancy and load-balancing

4.  Network Topology

   In an environment such as the data center where overlay networks are
   used, the IP multicast sources and receivers are hosts typically
   resident on servers attached to network access devices such as
   virtual software routers or physical access switches.  These hosts
   may belong to different tenants who require isolation and segregation
   in forwarding state and traffic.  At the same time, the physical
   transport or core network must be shared for traffic from multiple
   tenants.

   This requires the edge devices to support multiple VPNs facing the
   access interfaces, with a shared overlay encapsulation towards the
   core.

   Due to the high degree of traffic that flows among the hosts on
   various servers within the data center, a DC environment is likely to
   use uniform, densely meshed topologies, such as a spine-leaf
   topology, which provide high redundancy and bandwidth.  The servers
   may be dually attached to a pair of access switches for edge
   redundancy.





Rao & Jain              Expires January 17, 2014                [Page 4]

Internet-Draft                                                 July 2013


   In order to scale the core multicast, as well as to efficiently
   support the common case where a large number of senders and receivers
   are present in a VPN, shared trees are used for the overlay.  Bidir
   multicast is the preferred mode.  Also, to support a large number of
   VPNs efficiently, aggeegate trees are used with multiple VPNs being
   multiplexed over a core tree.  The various IP overlay encapsulations
   such as VXLAN, NVGRE or MPLS-in-GRE contain a VN-ID or VPN label in
   their header which is used as a distinguisher for a MVPN in the data
   plane.

5.  Solution Scenarios

   There are a couple of overall scenarios which are applicable for
   these virtual overlay networks.

   One mode of operation is where all overlay multicast traffic is
   encapsulated within a fixed number of core trees or P-tunnels that
   all edge nodes or PEs can be a part of.  Then both source and
   receiver edge devices can independently encapsulate and decapsulate
   overlay traffic, without requiring additional signaling among them.

   In this mode, source edge devices map a given C-flow from a locally
   attached source onto a specific core multicast tree or P-tunnel based
   on a local mapping decision.  Receiver edge devices join all the
   available P-tunnels, and hence are able to receive traffic from these
   source edge devices.  They then discard the traffic that they do not
   have local receivers for, and replicate the interesting flows towards
   local receivers.

   It is, however, desirable that all receiver edge devices do not
   receive all traffic and have to filter the unnecessary flows.  This
   is especially applicable in high-bandwidth traffic environments.  In
   order to support this ability, it is required to have signaling from
   a receiver edge device indicating its interest in specific C-groups.

   In addition, high-bandwidth sourced traffic flows may be sent on a
   specific P-tunnel which is determined by the source edge device, and
   requires interested receiver edge devices to join that core tree.  In
   such cases, it is also beneficial if the sourced multicast traffic is
   sent out into the overlay only if there are known to be receivers at
   other edge devices or external to the overlay.

   For all such cases, it is required to have signaling from both source
   and receiver edge nodes.  Signaling involves host group interest from
   receiver edge nodes as well as the C-flow to P-tunnel mapping
   signaling from source edge nodes.





Rao & Jain              Expires January 17, 2014                [Page 5]

Internet-Draft                                                 July 2013


6.  Fine-grained Transit Pruning

   Typically, when an overlay is used, the transit nodes in the physical
   topology only participate in the underlay control plane and forward
   all traffic based on the encpasulated packets.  They are unaware of
   the inner header or payload.  This allows them to be simpler and
   scale better.

   One consequence of using an overlay is that multicast traffic needs
   to make it to the receiver edge nodes before they can be pruned in
   case there are no downstream receivers.

   A widely deployed option to avoid redundant traffic from getting to
   all the receiver edge nodes is to use a larger number of core
   multicast trees, thereby reducing the ratio of overlay flows to each
   multicast tree, and allowing the receiver edge nodes to only join the
   multicast trees that the interesting flows map to.  This has the side
   effect of increasing the underlay multicast state in the core, and
   hence the load on the underlay multicast protocols such as PIM.  It
   also does not provide for complete pruning of multicast traffic.

   However, it is possible in certain topologies to support an
   alternative mechanism for fine-grained pruning of multi-destination
   traffic.

   In the uniform, meshed topologies that are used as mentioned earlier,
   certain transit nodes can use the receiver interest information sent
   by the receiver edge devices for the overlay, to filter traffic on
   outgoing links towards the receiver edge nodes on a per-VPN or more
   fine-grained basis.

   This allows the core multicast trees built by PIM to be smaller in
   number.  The transit nodes do need to run MP-BGP to obtain the
   overlay multicast group information sent by the various receiver
   nodes.  The transit nodes will typically obtain this information from
   the RRs.

   This assumes the core devices can look into the inner headers of
   packets to prune traffic based on either VN-ID or *G/S,Gs.  In order
   for the join routes to be sent to the transit nodes, the appropriate
   RTs will be provisioned on the transit nodes.

7.  Originating interest at a receiver edge device








Rao & Jain              Expires January 17, 2014                [Page 6]

Internet-Draft                                                 July 2013


   A receiver edge device will originate shared or source tree joins on
   the overlay for receivers that are locally attached or downstream to
   it.  This will be triggered by locally received IGMP or PIM joins.
   The receiver edge device will also typically act as a PIM first-hop
   or last-hop router with attached sources and receivers.

   These join routes need to be propagated towards potential sources as
   well as the RPs.  In a virtual network overlay topology, sources are
   attached to edge devices.  So the other edge devices must receive
   these join routes.

   For a situation where there are multiple sources which are spread out
   across a large number of edge devices, or for a case where the groups
   are Bidir groups, a shared tree join route may be propagated to all
   edge devices.

   For high bandwidth sources, it is desirable to direct the specific
   group or source joins to the appropriate source leafs.  More granular
   filtering is needed in this case - in addition to VRF, group based
   active source discovery and advertisement is used to control join
   propagation.

   A receiver leaf uses C-multicast route of type Shared Tree Join
   (C-RP, C-G) or Source Tree Join (C-S, C-G).  To be able to propagate
   the shared joins to all edge devices, the join routes may be
   originated with an RD that is specific to the originating device.
   This RD can be the same value as that used by unicast routes.  The
   routes are sent with a RT that all interested edge devices may use as
   an import RT for this VPN, if they need to receive the join routes.
   Route propagation is constrained based on policy (RT) along the path.

8.  Source mapping from a sender edge device

   An edge device that is attached to a source may signal the active
   sources information on the overlay, along with the core multicast
   group that the edge device decides to use.  Receiver edge nodes can
   use this information to join the core multicast trees.

   This mapping is advertised by using the S-PMSI A-D routes, with the
   PMSI Tunnel Attribute type indicating the appropriate overlay
   encapsulation type - VXLAN or NVGRE.  Aggregate trees will be used,
   with the VN-ID being signaled along with the route.









Rao & Jain              Expires January 17, 2014                [Page 7]

Internet-Draft                                                 July 2013


9.  IANA Considerations

   To be completed

10.  Security Considerations

   To be completed

11.  Change Log

12.  References

12.1.  Normative References

   [RFC1771]  Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
              (BGP-4)", RFC 1771, March 1995.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
              March 2000.

   [RFC5512]  Mohapatra, P. and E. Rosen, "The BGP Encapsulation
              Subsequent Address Family Identifier (SAFI) and the BGP
              Tunnel Encapsulation Attribute", RFC 5512, April 2009.

12.2.  Informative References

   [I-D.mahalingam-dutt-dcops-vxlan]
              Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A
              Framework for Overlaying Virtualized Layer 2 Networks over
              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-02
              (work in progress), August 2012.

   [I-D.sridharan-virtualization-nvgre]
              Sridharan, M., Greenberg, A., Venkataramaiah, N., Wang,
              Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P.,
              and C. Tumuluri, "NVGRE: Network Virtualization using
              Generic Routing Encapsulation", draft-sridharan-
              virtualization-nvgre-02 (work in progress), February 2013.

Authors' Addresses






Rao & Jain              Expires January 17, 2014                [Page 8]

Internet-Draft                                                 July 2013


   Dhananjaya Rao
   Cisco Systems
   170 W. Tasman Drive
   San Jose, CA 95124  95134
   USA

   Email: dhrao@cisco.com


   Vipin Jain
   Cisco Systems
   170 W. Tasman Drive
   San Jose, CA 95124  95134
   USA

   Email: vipijain@cisco.com



































Rao & Jain              Expires January 17, 2014                [Page 9]