L2VPN Workgroup A. Sajassi, Ed. INTERNET-DRAFT S. Salam Intended Status: Standards Track S. Thoria Cisco J. Drake Juniper J. Rabadan Nokia Expires: January 2, 2019 July 2, 2018 Integrated Routing and Bridging in EVPN draft-ietf-bess-evpn-inter-subnet-forwarding-04 Abstract EVPN provides an extensible and flexible multi-homing VPN solution over an MPLS/IP network for intra-subnet connectivity among Tenant Systems and End Devices that can be physical or virtual. However, there are scenarios for which there is a need for a dynamic and efficient inter-subnet connectivity among these Tenant Systems and End Devices while maintaining the multi-homing capabilities of EVPN. This document describes an Integrated Routing and Bridging (IRB) solution based on EVPN to address such requirements. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Sajassi et al. Expires January 2, 2019 [Page 1] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 Copyright and License Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 EVPN PE Model for IRB Operation . . . . . . . . . . . . . . . . 7 3 Symmetric and Asymmetric IRB . . . . . . . . . . . . . . . . . 8 3.1 IRB Interface and its MAC & IP addresses . . . . . . . . . . 11 3.2 Symmetric IRB Procedures . . . . . . . . . . . . . . . . . . 12 3.2.1 Control Plane - Ingress PE . . . . . . . . . . . . . . . 12 3.2.2 Control Plane - Egress PE . . . . . . . . . . . . . . . 13 3.2.3 Data Plane - Ingress PE . . . . . . . . . . . . . . . . 14 3.2.4 Data Plane - Egress PE . . . . . . . . . . . . . . . . . 14 3.3 Asymmetric IRB Procedures . . . . . . . . . . . . . . . . . 15 3.3.1 Control Plane - Ingress PE . . . . . . . . . . . . . . . 15 3.3.2 Control Plane - Egress PE . . . . . . . . . . . . . . . 15 3.3.3 Data Plane - Ingress PE . . . . . . . . . . . . . . . . 16 3.3.4 Data Plane - Egress PE . . . . . . . . . . . . . . . . . 17 4 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1 Router's MAC Extended Community . . . . . . . . . . . . . . 17 5 Operational Models for Symmetric Inter-Subnet Forwarding . . . . 18 5.1 IRB forwarding on NVEs for Tenant Systems . . . . . . . . . 18 5.1.1 Control Plane Operation . . . . . . . . . . . . . . . . 19 5.1.2 Data Plane Operation - Inter Subnet . . . . . . . . . . 21 5.1.3 TS Move Operation . . . . . . . . . . . . . . . . . . . 22 5.2 IRB forwarding on NVEs for Subnets behind Tenant Systems . . 23 5.2.1 Control Plane Operation . . . . . . . . . . . . . . . . 24 5.2.2 Data Plane Operation . . . . . . . . . . . . . . . . . . 25 6 Inter-Subnet DCI Scenarios . . . . . . . . . . . . . . . . . . 26 6.1 Switching among IP subnets in different DCs without GW . . . 27 6.2 Switching among IP subnets in different DCs with GW . . . . 29 7 TS Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . 31 7.1 TS Mobility & Optimum Forwarding for TS Outbound Traffic . . 31 Sajassi et al. Expires January 2, 2019 [Page 2] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 7.2 TS Mobility & Optimum Forwarding for TS Inbound Traffic . . 31 7.2.1 Mobility without Route Aggregation . . . . . . . . . . . 31 8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32 9 Security Considerations . . . . . . . . . . . . . . . . . . . . 32 10 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 11.1 Normative References . . . . . . . . . . . . . . . . . . . 32 11.2 Informative References . . . . . . . . . . . . . . . . . . 32 12 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 33 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. AC: Attachment Circuit. ARP: Address Resolution Protocol. BD: Broadcast Domain. As per [RFC7432], an EVI consists of a single or multiple BDs. In case of VLAN-bundle and VLAN-based service models (see [RFC7432]), a BD is equivalent to an EVI. In case of VLAN-aware bundle service model, an EVI contains multiple BDs. Also, in this document, BD and subnet are equivalent terms. BD Route Target: refers to the Broadcast Domain assigned Route Target [RFC4364]. In case of VLAN-aware bundle service model, all the BD instances in the MAC-VRF share the same Route Target. BT: Bridge Table. The instantiation of a BD in a MAC-VRF, as per [RFC7432]. DGW: Data Center Gateway. Ethernet A-D route: Ethernet Auto-Discovery (A-D) route, as per [RFC7432]. Ethernet NVO tunnel: refers to Network Virtualization Overlay tunnels with Ethernet payload. Examples of this type of tunnels are VXLAN or GENEVE. EVI: EVPN Instance spanning the NVE/PE devices that are participating on that EVPN, as per [RFC7432]. Sajassi et al. Expires January 2, 2019 [Page 3] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 EVPN: Ethernet Virtual Private Networks, as per [RFC7432]. GRE: Generic Routing Encapsulation. GW IP: Gateway IP Address. IPL: IP Prefix Length. IP NVO tunnel: it refers to Network Virtualization Overlay tunnels with IP payload (no MAC header in the payload). IP-VRF: A VPN Routing and Forwarding table for IP routes on an NVE/PE. The IP routes could be populated by EVPN and IP-VPN address families. An IP-VRF is also an instantiation of a layer 3 VPN in an NVE/PE. IRB: Integrated Routing and Bridging interface. It connects an IP-VRF to a BD (or subnet). MAC-VRF: A Virtual Routing and Forwarding table for Media Access Control (MAC) addresses on an NVE/PE, as per [RFC7432]. A MAC-VRF is also an instantiation of an EVI in an NVE/PE. ML: MAC address length. ND: Neighbor Discovery Protocol. NVE: Network Virtualization Edge. GENEVE: Generic Network Virtualization Encapsulation, [GENEVE]. NVO: Network Virtualization Overlays. RT-2: EVPN route type 2, i.e., MAC/IP advertisement route, as defined in [RFC7432]. RT-5: EVPN route type 5, i.e., IP Prefix route. As defined in Section 3 of [EVPN-PREFIX]. SBD: Supplementary Broadcast Domain. A BD that does not have any ACs, only IRB interfaces, and it is used to provide connectivity among all the IP-VRFs of the tenant. The SBD is only required in IP-VRF- to-IP- VRF use-cases (see Section 4.4.). SN: Subnet. TS: Tenant System. Sajassi et al. Expires January 2, 2019 [Page 4] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 VA: Virtual Appliance. VNI: Virtual Network Identifier. As in [RFC8365], the term is used as a representation of a 24-bit NVO instance identifier, with the understanding that VNI will refer to a VXLAN Network Identifier in VXLAN, or Virtual Network Identifier in GENEVE, etc. unless it is stated otherwise. VTEP: VXLAN Termination End Point, as in [RFC7348]. VXLAN: Virtual Extensible LAN, as in [RFC7348]. This document also assumes familiarity with the terminology of [RFC7432], [RFC8365] and [RFC7365]. Sajassi et al. Expires January 2, 2019 [Page 5] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 1 Introduction EVPN provides an extensible and flexible multi-homing VPN solution over an MPLS/IP network for intra-subnet connectivity among Tenant Systems (TS's) and End Devices that can be physical or virtual; where an IP subnet is represented by an EVI for a VLAN-based service or by an for a VLAN-aware bundle service. However, there are scenarios for which there is a need for a dynamic and efficient inter-subnet connectivity among these Tenant Systems and End Devices while maintaining the multi-homing capabilities of EVPN. This document describes an Integrated Routing and Bridging (IRB) solution based on EVPN to address such requirements. The inter-subnet communication is traditionally achieved at centralized L3 Gateway (L3GW) nodes where all the inter-subnet communication policies are enforced. When two Tenant Systems (TS's) belonging to two different subnets connected to the same PE node, wanted to communicate with each other, their traffic needed to be back hauled from the PE node all the way to the centralized gateway nodes where inter-subnet switching is performed and then back to the PE node. For today's large multi-tenant data center, this scheme is very inefficient and sometimes impractical. In order to overcome the drawback of centralized L3GW approach, IRB functionality is needed on the PE nodes (also referred to as EVPN NVEs) attached to TS's in order to avoid inefficient forwarding of tenant traffic (i.e., avoid back-hauling and hair-pinning). A PE with IRB capability, can not only locally bridged the tenant intra-subnet traffic but also can locally route the tenant inter-subnet traffic on a packet by packet basis thus meeting the requirements for both intra and inter-subnet forwarding and avoiding non-optimum traffic forwarding associate with centralized L3GW approach. Some TS's run non-IP protocols in conjunction with their IP traffic. Therefore, it is important to handle both kinds of traffic optimally - e.g., to bridge non-IP and intra-subnet traffic and to route inter- subnet IP traffic. Therefore, the solution needs to meet the following requirements: R1: The solution MUST allow for both inter-subnet and intra-subnet traffic belonging to the same tenant to be locally routed and bridged respectively. The solution MUST provide IP routing for inter-subnet traffic and Ethernet Bridging for intra-subnet traffic. R2: The solution MUST support bridging of non-IP traffic. R3: The solution MUST allow inter-subnet switching to be disabled on a per VLAN basis on PEs where the traffic needs to be back hauled to Sajassi et al. Expires January 2, 2019 [Page 6] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 another node (i.e., for performing FW or DPI functionality). 2 EVPN PE Model for IRB Operation Since this document discusses IRB operation in relationship to EVPN MAC-VRF, IP-VRF, EVI, Bridge Domain (BD), Bridge Table (BT), and IRB interfaces, it is important to understand the relationship among these components. Therefore, the following PE model is demonstrated below to a) describe these components and b) illustrate the relationship among them. +-------------------------------------------------------------+ | | | +------------------+ IRB PE | | Attachment | +------------------+ | | Circuit(AC1) | | +----------+ | MPLS/NVO tnl ----------------------*Bridge | | +----- | | | |Table(BT1)| | +-----------+ / \ \ | | | | *---------* |<--> |Eth| | | | |Eth-Tag x | |IRB1| | \ / / | | | +----------+ | | | +----- | | | ... | | IP-VRF1 | | | | | +----------+ | | RD2/RT2 |MPLS/NVO tnl | | | |Bridge | | | | +----- | | | |Table(BT2)| |IRB2| | / \ \ | | | | *---------* |<--> |IP | ----------------------*Eth-Tag y | | +-----------+ \ / / | AC2 | | +----------+ | +----- | | | MAC-VRF1 | | | +-+ RD1/RT1 | | | +------------------+ | | | | | +-------------------------------------------------------------+ Figure 1: EVPN IRB PE Model A tenant needing IRB services on a PE, requires an IP Virtual Routing and Forwarding table (IP-V RF) along with one or more MAC Virtual Routing and Forwarding tables (MAC-VRFs). An IP-VRF, as defined in [RFC4364], is the instantiation of an IPVPN in a PE. A MAC-VRF, as defined in [RFC7432], is the instantiation of an EVI (EVPN Instancce) in a PE. A MAC-VRF can consists of one or more Bridge Tables (BTs) where each BT corresponds to a VLAN (broadcast domain - BD). If the Sajassi et al. Expires January 2, 2019 [Page 7] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 service interface for the EVPN PE is configured in VLAN-Based mode (i.e., section 6.1 of [RFC7432]), then there is only a single BT per MAC-VRF (per EVI) - i.e., there is only one tenant VLAN per EVI. However, if the service interface for the EVPN PE is configured in VLAN-Aware Bundle mode (i.e., section 6.3 of [RFC7432]), then there are several BTs per MAC-VRF (per EVI) - i.e., there are several tenant VLANs per EVI. Each BT is connected to a IP-VRF via a L3 interface called IRB interface. Since a single tenant subnet is typically (and in this document) represented by a VLAN (and thus supported by a single BT), for a given tenant there are as many BTs as there are subnets and thus there are also as many IRB interfaces between the tenant IP-VRF and the associated BTs as shown in the PE model above. IP-VRF is identified by its corresponding route target and route distinguisher and MAC-VRF is also identified by its corresponding route target and route distinguisher. If operating in EVPN VLAN-Based mode, then a receiving PE that receives an EVPN route with MAC-VRF route target can identify the corresponding BT; however, if operating in EVPN VLAN-Aware Bundle mode, then the receiving PE needs both the MAC-VRF route target and VLAN ID in order to identify the corresponding BT. 3 Symmetric and Asymmetric IRB This document defines and describes two types of IRB solutions - namely symmetric and asymmetric IRB. In symmetric IRB as its name implies, the lookup operation is symmetric at both ingress and egress PEs - i.e., both ingress and egress PEs perform lookups on both TS's MAC and IP addresses - i.e., ingress PE performs lookup on destination TS's MAC address followed by its IP address and egress PE performs lookup on destination TS's IP address followed by its MAC address as depicted in figure 2. Sajassi et al. Expires January 2, 2019 [Page 8] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 Ingress PE Egress PE +-------------------+ +------------------+ | | | | | +-> IP-VFF ----|---->---|-----> IP-VRF -+ | | | | | | | | BT1 BT2 | | BT3 BT2 | | | | | | | | ^ | | v | | | | | | | +-------------------+ +------------------+ ^ | | | TS1->-+ +->-TS2 Figure 2: Symmetric IRB In symmetric IRB as shown in figure-2, the inter-subnet forwarding between two PEs is done between their associated IP-VRFs. Therefore, the tunnel connecting these IP-VRFs can be either IP-only tunnel (in case of MPLS or GENEVE encapsulation) or Ethernet NVO tunnel (in case of VxLAN encapsulation). If it is Ethernet NOV tunnel, the TS's IP packet is encapsulated in an Ethernet header consisting of ingress and egress PEs MAC addresses - i.e., there is no need for ingress PE to use the destination TS's MAC address. Therefore, in symmetric IRB, there is no need for the ingress PE to hold destination TS's IP and MAC association in its ARP table. Each PE participating in symmetric IRB only maintains ARP entries for locally connected hosts and maintain MAC-VRFs/BTs for only locally configured subnets. In asymmetric IRB, the lookup operation is asymmetric and the ingress PE performs three lookups; whereas the egress PE performs a single lookup - i.e., the ingress PE performs lookups on destination TS's MAC address, followed by its IP address, followed by its MAC address again; whereas, the egress PE performs just a single lookup on destination TS's MAC address as depicted in figure 3 below. Sajassi et al. Expires January 2, 2019 [Page 9] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 Ingress PE Egress PE +-------------------+ +------------------+ | | | | | +-> IP-VFF -> | | IP-VRF | | | | | | | | BT1 BT2 | | BT3 BT2 | | | | | | | | | | | +--|--->----|--------------+ | | | | | | v | +-------------------+ +----------------|-+ ^ | | | TS1->-+ +->-TS2 Figure 3: Asymmetric IRB In asymmetric IRB as shown in figure-2, the inter-subnet forwarding between two PEs is done between their associated MAC-VRFs/BTs. Therefore, the MPLS or NVO tunnel used for inter-subnet forwarding MUST be of type Ethernet. Since at the egress PE only MAC lookup is performed (e.g., no IP lookup), the TS's IP packet needs to be encapsulated with the destination TS's MAC address. In order for ingress PE to perform such encapsulation, it needs to maintain TS's IP and MAC address association in its ARP table. Furthermore, it needs to maintain destination TS's MAC address in the corresponding BT even though it does not have the corresponding subnet locally configured. In other words, each PE participating in asymmetric IRB MUST maintain ARP entries for remote hosts (hosts connected to other PEs) as well as maintaining MAC-VRFs/BTs for subnets that are not locally present on that PE. The following subsection defines the control and data planes procedures for symmetric and asymmetric IRB on ingress and egress PEs. The following figure is used in description of these procedures where it shows a single IP-VRF and a number of BTs on each PE for a given tenant. The IP-VRF of the tenant (i.e., IP-VRF1) is connected to each BT via its associated IRB interface. Each BT on a PE is associated with a unique VLAN (e.g., with a BD) where in turn is associated with a single MAC-VRF in case of VLAN-Based mode or a number of BTs can be associated with a single MAC-VRF in case of VLAN-Aware Bundle mode. Whether the service interface on a PE is VLAN-Based or VLAN-Aware Bundle mode does not impact the IRB operation and procedures. It only impacts the setting of Ethernet tag field in EVPN routes as described in [RFC7432]. Sajassi et al. Expires January 2, 2019 [Page 10] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 PE 1 +---------+ +-------------+ | | TS1-----| MACx| | | PE2 (IP1/M1) |(BT1) | | | +-------------+ TS5-----| \ | | MPLS/ | |MACy (BT3) |-----TS3 (IP5/M5) |Mx/IPx \ | | VxLAN/ | | / | (IP3/M3) | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | | / | | | | \ | TS2-----|(BT2) / | | | | (BT1) |-----TS4 (IP2/M2) | | | | | | (IP4/M4) +-------------+ | | +-------------+ | | +---------+ Figure 4: IRB forwarding 3.1 IRB Interface and its MAC & IP addresses To support inter-subnet forwarding on a PE, the PE acts as an IP Default Gateway from the perspective of the attached Tenant Systems where default gateway MAC and IP addresses are configured on each IRB interface associated with its subnet and falls into one of the following two options: 1. All the PEs for a given tenant subnet use the same anycast default gateway IP and MAC addresses . On each PE, this default gateway IP and MAC addresses correspond to the IRB interface connecting the BT associated with the tenant's to the corresponding tenant's IP-VRF. 2. Each PE for a given tenant subnet uses the same anycast default gateway IP address but its own MAC address. These MAC addresses are aliased to the same anycast default gateway IP address through the use of the Default Gateway extended community as specified in [EVPN], which is carried in the EVPN MAC/IP Advertisement routes. On each PE, this default gateway IP address along with its associated MAC addresses correspond to the IRB interface connecting the BT associated with the tenant's to the corresponding tenant's IP-VRF. It is worth noting that if the applications that are running on the TS's are employing or relying on any form of MAC security, then either the first model (i.e. using anycast MAC address) should be used to ensure that the applications receive traffic from the same IRB interface MAC address that they are sending to, or if the second model is used, then the IRB interface MAC address MUST be the one used in the initial ARP reply for that TS. Sajassi et al. Expires January 2, 2019 [Page 11] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 Although both of these options are equally applicable to both symmetric and asymmetric IRB, the option-1 is recommended because of the ease of anycast MAC address provisioning on not only the IRB interface associated with a given subnet across all the PEs corresponding to that EVI but also on all IRB interfaces associated with all the tenant's subnets across all the PEs corresponding to all the EVIs for that tenant. Furthermore, it simplifies the operation as there is no need for Default Gateway extended community advertisement and its associated MAC aliasing procedure. When a TS sends an ARP request to the PE that is attached to, the ARP request is sent for the IP address of the IRB interface associated with the TS's subnet. For example, in figure 4, TS1 is configured with the anycast IPx address as its default gateway IP address and thus when it sends an ARP request for IPx (IP address of the IRB interface for BT1), the PE1 sends an ARP reply with the MACx which is the MAC address of that IRB interface. In addition to anycast addresses, IRB interfaces can be configured with non-anycast IP addresses for the purpose of OAM (such as traceroute/ping to these interfaces) for both symmetric and asymmetric IRB. These IP addresses need to be distributed as VPN routes when PEs operating in symmetric IRB mode. However, they don't need to be distributed if the PEs are operating in asymmetric IRB mode and the IRB interfaces are configured with individual MACs. 3.2 Symmetric IRB Procedures 3.2.1 Control Plane - Ingress PE When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of a TS (via an ARP request), it adds the MAC address to the corresponding MAC-VRF/BT of that tenant's subnet and adds the IP address to the IP-VRF for that tenant. Furthermore, it adds this TS's MAC and IP address association to its ARP table. It then builds an EVPN MAC/IP Advertisement route (type 2) as follow and advertises it to other PEs participating in that tenant's VPN. - The Length field of the BGP EVPN NLRI for an EVPN MAC/IP Advertisement route MUST be either 40 (if IPv4 address is carried) or 52 (if IPv6 address is carried). - Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet Tag ID, MAC Address Length, MAC Address, IP Address Length, IP Address, and MPLS Label1 fields MUST be used as defined in [RFC7432] and [RFC8365]. Sajassi et al. Expires January 2, 2019 [Page 12] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 - The MPLS Label2 field is set to either an MPLS label or a VNI corresponding to the tenant's IP-VRF. In case of an MPLS label, this field is encoded as 3 octets, where the high-order 20 bits contain the label value. Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, MAC Address, IP Address Length, and IP Address fields are part of the route key used by BGP to compare routes. The rest of the fields are not part of the route key. This route is advertised along with the following two extended communities: 1) Tunnel Type Extended Community 2) Router's MAC Extended Community For symmetric IRB mode, Router's MAC EC is needed to carry the PE's overlay MAC address which is used for IP-VRF to IP-VRF communications with Ethernet NVO tunnel. If MPLS or IP-only NVO tunnel is used, then there is no need to send Router's MAC Extended Community along with this route. This route MUST be advertised with two route targets - one corresponding to the MAC-VRF of the tenant's subnet and another corresponding to the tenant's IP-VRF. 3.2.2 Control Plane - Egress PE When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP Advertisement route advertisement, it performs the following: - Using MAC-VRF route target, it identifies the corresponding MAC- VRF. If the MAC-VRF exists (e.g., it is locally configured) then it imports the MAC address into it. Otherwise, it does not import the MAC address. - Using IP-VRF route target, it identifies the corresponding IP-VRF and imports the IP address into it. The inclusion of MPLS label2 field in this route signals to the receiving PE that this route is for symmetric IRB mode and MPLS label2 needs to be installed in forwarding path to identify the corresponding IP-VRF. If the receiving PE receives this route with both the MAC-VRF and IP- VRF route targets but the MAC/IP Advertisement route does not include MPLS label2 field and if the receiving PE does not support asymmetric IRB mode, then if it has the corresponding MAC-VRF, it only imports Sajassi et al. Expires January 2, 2019 [Page 13] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 the MAC address; otherwise, if it doesn't have the corresponding MAC- VRF, it MUST treat the route as withdraw [RFC7606]. 3.2.3 Data Plane - Ingress PE When an Ethernet frame is received by an ingress PE (e.g., PE1 in figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify the associated MAC-VRF/BT and it performs a lookup on the destination MAC address. If the MAC address corresponds to its IRB Interface MAC address, the ingress PE deduces that the packet must be inter-subnet routed. Hence, the ingress PE performs an IP lookup in the associated IP-VRF table. The lookup identifies BGP next hop of egress PE along with the tunnel/encapsulation type and the associated MPLS/VNI values. If the tunnel type is that of MPLS or IP-only NVO tunnel, then TS's IP packet is sent over the tunnel without any Ethernet header. However, if the tunnel type is that of Eternet NVO tunnel, then an Ethernet header needs to be added to the TS's IP packet. The source MAC address of this Ethernet header is set to the ingress PE's router MAC address and the destination MAC address of this Ethernet header is set to the egress PE's router MAC address. The MPLS VPN label or VNI fields are set accordingly and the packet is forwarded to the egress PE. If case of NVO tunnel encapsulation, the outer source IP address is set to the ingress PE's BGP next-hop address and outer destination IP address is set to the egress PE's BGP next-hop address. 3.2.4 Data Plane - Egress PE When the tenant's MPLS or NVO encapsulated packet is received over an MPLS or NVO tunnel by the egress PE, the egress PE removes NVO tunnel encapsulation and uses the VPN MPLS label (for MPLS encapsulation) or VNI (for VxLAN encapsulation) to identify the IP-VRF in which IP lookup needs to be performed. The lookup identifies a local adjacency to the IRB interface associated with the egress subnet's MAC-VRF/BT. The egress PE gets the destination TS's MAC address for that TS's IP address from its ARP table, it encapsulates the packet with that destination MAC address and a source MAC address corresponding to that IRB interface and sends the packet to its destination subnet MAC-VRF/BT. Sajassi et al. Expires January 2, 2019 [Page 14] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 The destination MAC address lookup in the MAC-VRF/BT results in local adjacency (e.g., local interface) over which the Ethernet frame is sent on. 3.3 Asymmetric IRB Procedures 3.3.1 Control Plane - Ingress PE When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of a TS (via an ARP request), it populates its MAC-VRF/BT, IP-VRF, and ARP table just as in the case for symmetric IRB. It then builds an EVPN MAC/IP Advertisement route (type 2) as follow and advertises it to other PEs participating in that tenant's VPN. - The Length field of the BGP EVPN NLRI for an EVPN MAC/IP Advertisement route MUST be either 37 (if IPv4 address is carried) or 49 (if IPv6 address is carried). - Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet Tag ID, MAC Address Length, MAC Address, IP Address Length, IP Address, and MPLS Label1 fields MUST be used as defined in [RFC7432] and [RFC8365]. - The MPLS Label2 field MUST NOT be included in this route. Just as in [RFC7432], the RD, Ethernet Tag ID, MAC Address Length, MAC Address, IP Address Length, and IP Address fields are part of the route key used by BGP to compare routes. The rest of the fields are not part of the route key. This route is advertised along with the following extended communitiy: 1) Tunnel Type Extended Community For asymmetric IRB mode, Router's MAC EC is not needed because forwarding is performed using destination TS's MAC address which is carried in this route advertisement. This route MUST always be advertised with MAC-VRF route target. It MAY also be advertised with a second route target corresponding to the IP-VRF. If only MAC-VRF route target is used, then the receiving PE uses the MAC-VRF route target to identify the corresponding IP-VRF - i.e., many MAC-VRF route targets map to the same IP-VRF for a given tenant. 3.3.2 Control Plane - Egress PE Sajassi et al. Expires January 2, 2019 [Page 15] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 When a PE (e.g., PE2 in figure 4 above) receives this EVPN MAC/IP Advertisement route advertisement, it performs the following: - Using MAC-VRF route target, it identifies the corresponding MAC-VRF and imports the MAC address into it. For asymmetric IRB mode, it is assumed that all PEs participating in a tenant's VPN are configured with all subnets and corresponding MAC-VRFs/BTs even if there are no locally attached TS's for some of these subnets. The reason for this is because ingress PE needs to do forwarding based on destination TS's MAC address and does proper NVO tunnel encapsulation which are property of a lookup in MAC-VRF/BT. An implementation may choose to consolidate the lookup at the ingress PE's IP-VRF with the lookup at the ingress PE's destination subnet MAC-VRF. Consideration for such consolidation of lookups is outside the scope of this document. - Using MAC-VRF route target, it identifies the corresponding ARP table for the tenant and it adds an entry to the ARP table for the TS's MAC and IP address association. It should be noted that the tenant's ARP table at the receiving PE is identified by all the MAC- VRF route targets for that tenant. If IP-VRF route target is included with this route advertisement, then it MAY be used for the identification of tenant's ARP table. For asymmetric IRB mode, the MPLS label2 field SHOULD not be included in the route; however, if the receiving PE receives this route with the MPLS label2 field, then it SHOULD ignore it. 3.3.3 Data Plane - Ingress PE When an Ethernet frame is received by an ingress PE (e.g., PE1 in figure 4 above), the PE uses the AC ID (e.g., VLAN ID) to identify the associated MAC-VRF/BT and it performs a lookup on the destination MAC address. If the MAC address corresponds to its IRB Interface MAC address, the ingress PE deduces that the packet must be inter-subnet routed. Hence, the ingress PE performs an IP lookup in the associated IP-VRF table. The lookup identifies a local adjacency to the IRB interface associated with the egress subnet's MAC-VRF/BT. The ingress PE gets the destination TS's MAC address for that TS's IP address from its ARP table, it encapsulates the packet with that destination MAC address and a source MAC address corresponding to that IRB interface and sends the packet to its destination subnet MAC-VRF/BT. The destination MAC address lookup in the MAC-VRF/BT results in BGP next hop address of egress PE. The ingress PE encapsulates the packet using Ethernet NVO tunnel of the choice (e.g., VxLAN or GENEVE) and Sajassi et al. Expires January 2, 2019 [Page 16] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 sends the packet to the egress PE. Since the packet forwarding is between ingress PE's MAC-VRF/BT and egress PE's MAC-VRF/BT, the packet encapsulation procedures follows that of [RFC7432] for MPLS and [RFC8365] for VxLAN encapsulations. 3.3.4 Data Plane - Egress PE When a tenant's Ethernet frame is received over an NVO tunnel by the egress PE, the egress PE removes NVO tunnel encapsulation and uses the VPN MPLS label (for MPLS encapsulation) or VNI (for VxLAN encapsulation) to identify the MAC-VRF/BT in which MAC lookup needs to be performed. The MAC lookup results in local adjacency (e.g., local interface) over which the packet needs to get sent. Note that the forwarding behavior on the egress PE is the same as EVPN intra-subnet forwarding described in [RFC7432] for MPLS and [RFC8365] for VxLAN networks. In other words, all the packet processing associated with the inter-subnet forwarding semantics is confined to the ingress PE. It should also be noted that [RFC7432] provides different level of granularity for the EVPN label. Besides identifying bridge domain table, it can be used to identify the egress interface or a destination MAC address on that interface. If EVPN label is used for egress interface or individual MAC address identification, then no MAC lookup is needed in the egress PE for MPLS encapsulation and the packet can be directly forwarded to the egress interface just based on EVPN label lookup. 4 BGP Encoding This document defines one new BGP Extended Community for EVPN. 4.1 Router's MAC Extended Community A new EVPN BGP Extended Community called Router's MAC is introduced here. This new extended community is a transitive extended community with the Type field of 0x06 (EVPN) and the Sub-Type of 0x03. It may be advertised along with BGP Encapsulation Extended Community define in section 4.5 of [TUNNEL-ENCAP]. The Router's MAC Extended Community is encoded as an 8-octet value as follows: Sajassi et al. Expires January 2, 2019 [Page 17] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type=0x06 | Sub-Type=0x03 | Router's MAC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Router's MAC Cont'd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Router's MAC Extended Community This extended community is used to carry the PE's MAC address for symmetric IRB scenarios and it is sent with RT-2. 5 Operational Models for Symmetric Inter-Subnet Forwarding The following sections describe two main symmetric IRB forwarding scenarios (within a DC - i.e., intra-DC) along with their corresponding procedures. In the following scenarios, without loss of generality, it is assumed that a given tenant is represented by a single IP-VPN instance. Therefore, on a given PE, a tenant is represented by a single IP-VRF table and one or more MAC-VRF tables. 5.1 IRB forwarding on NVEs for Tenant Systems This section covers the symmetric IRB procedures for the scenario where each Tenant System (TS) is attached to one or more NVEs and its host IP and MAC addresses are learned by the attached NVEs and are distributed to all other NVEs that are interested in participating in both intra-subnet and inter-subnet communications with that TS. In this scenario, for a given tenant, an NVE has typically one MAC- VRF for each tenant's subnet (VLAN) that is configured for, assuming VLAN-based service which is typically the case for VxLAN and NVGRE encapsulation and each MAC-VRF consists of a single bridge domain. In case of MPLS encapsulation with VLAN-aware bundling, then each MAC- VRF consists of multiple bridge domains (one bridge domain per VLAN). The MAC-VRFs on an NVE for a given tenant are associated with an IP- VRF corresponding to that tenant (or IP-VPN instance) via their IRB interfaces. Each NVE MUST support QoS, Security, and OAM policies per IP-VRF to/from the core network. This is not to be confused with the QoS, Security, and OAM policies per Attachment Circuits (AC) to/from the Tenant Systems. How this requirement is met is an implementation choice and it is outside the scope of this document. Since VxLAN and NVGRE encapsulations require inner Ethernet header Sajassi et al. Expires January 2, 2019 [Page 18] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 (inner MAC SA/DA), and since for inter-subnet traffic, TS MAC address cannot be used, the ingress NVE's MAC address is used as inner MAC SA. The NVE's MAC address is the device MAC address and it is common across all MAC-VRFs and IP-VRFs. This MAC address is advertised using the new EVPN Router's MAC Extended Community (section 6.1). Figure 6 below illustrates this scenario where a given tenant (e.g., an IP-VPN instance) has three subnets represented by MAC-VRF1, MAC- VRF2, and MAC-VRF3 across two NVEs. There are five TS's that are associated with these three MAC-VRFs - i.e., TS1, TS4, and TS5 are sitting on the same subnet (e.g., same MAC-VRF/VLAN);where, TS1 and TS5 are associated with MAC-VRF1 on NVE1, TS4 is associated with MAC- VRF1 on NVE2. TS2 is associated with MAC-VRF2 on NVE1, and TS3 is associated with MAC-VRF3 on NVE2. MAC-VRF1 and MAC-VRF2 on NVE1 are in turn associated with IP-VRF1 on NVE1 and MAC-VRF1 and MAC-VRF3 on NVE2 are associated with IP-VRF1 on NVE2. When TS1, TS5, and TS4 exchange traffic with each other, only L2 forwarding (bridging) part of the IRB solution is exercised because all these TS's sit on the same subnet. However, when TS1 wants to exchange traffic with TS2 or TS3 which belong to different subnets, then both bridging and routing parts of the IRB solution are exercised. The following subsections describe the control and data planes operations for this IRB scenario in details. NVE1 +---------+ +-------------+ | | TS1-----| MACx| | | NVE2 (IP1/M1) |(MAC- | | | +-------------+ TS5-----| VRF1)\ | | MPLS/ | |MACy (MAC- |-----TS3 (IP5/M5) | \ | | VxLAN/ | | / VRF3) | (IP3/M3) | (IP-VRF1)|----| NVGRE |---|(IP-VRF1) | | / | | | | \ | TS2-----|(MAC- / | | | | (MAC- |-----TS4 (IP2/M2) | VRF2) | | | | VRF1) | (IP4/M4) +-------------+ | | +-------------+ | | +---------+ Figure 6: IRB forwarding on NVEs for Tenant Systems 5.1.1 Control Plane Operation Each NVE advertises a MAC/IP Advertisement route (i.e., Route Type 2) for each of its TS's with the following field set: - RD and ESI per [EVPN] Sajassi et al. Expires January 2, 2019 [Page 19] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 - Ethernet Tag = 0; assuming VLAN-based service - MAC Address Length = 48 - MAC Address = Mi ; where i = 1,2,3,4, or 5 in the above example - IP Address Length = 32 or 128 - IP Address = IPi ; where i = 1,2,3,4, or 5 in the above example - Label-1 = MPLS Label or VNID corresponding to MAC-VRF - Label-2 = MPLS Label or VNID corresponding to IP-VRF Each NVE advertises an RT-2 route with two Route Targets (one corresponding to its MAC-VRF and the other corresponding to its IP- VRF. Furthermore, the RT-2 is advertised with two BGP Extended Communities. The first BGP Extended Community identifies the tunnel type per section 4.5 of [TUNNEL-ENCAP] and the second BGP Extended Community includes the MAC address of the NVE (e.g., MACx for NVE1 or MACy for NVE2) as defined in section 6.1. This second Extended Community (for the MAC address of NVE) is only required when Ethernet NVO tunnel type is used. If IP NVO tunnel type is used, then there is no need to send this second Extended Community. It should be noted that IP NVO tunnel type is only applicable to symmetric IRB procedures. Upon receiving this advertisement, the receiving NVE performs the following: - It uses Route Targets corresponding to its MAC-VRF and IP-VRF for identifying these tables and subsequently importing this route into them. - It imports the MAC address from MAC/IP Advertisement route into the MAC-VRF with BGP Next Hop address as underlay tunnel destination address (e.g., VTEP DA for VxLAN encapsulation) and Label-1 as VNID for VxLAN encapsulation or EVPN label for MPLS encapsulation. - If the route carries the new Router's MAC Extended Community, and if the receiving NVE is using Ethernet NVO tunnel, then the receiving NVE imports the IP address into IP-VRF with NVE's MAC address (from the new Router's MAC Extended Community) as inner MAC DA and BGP Next Hop address as underlay tunnel destination address, VTEP DA for VxLAN encapsulation and Label-2 as IP-VPN VNID for VxLAN encapsulation. - If the receiving NVE is going to use MPLS encapsulation, then the receiving NVE imports the IP address into IP-VRF with BGP Next Hop address as underlay tunnel destination address, and Label-2 as IP-VPN label for MPLS encapsulation. If the receiving NVE receives a RT-2 with only a single Route Target corresponding to IP-VRF and Label-1, or if it receives a RT-2 with only a single Route Target corresponding to MAC-VRF but with both Sajassi et al. Expires January 2, 2019 [Page 20] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 Label-1 and Label-2, or if it receives a RT-2 with MAC Address Length of zero, then it must not import it to either IP-VRF or MAC-VRF and it must log an error. 5.1.2 Data Plane Operation - Inter Subnet The following description of the data-plane operation describes just the logical functions and the actual implementation may differ. Lets consider data-plane operation when TS1 in subnet-1 (MAC-VRF1) on NVE1 wants to send traffic to TS3 in subnet-3 (MAC-VRF3) on NVE2. - NVE1 receives a packet with MAC DA corresponding to the MAC-VRF1 IRB interface on NVE1 (the interface between MAC-VRF1 and IP-VRF1), and VLAN-tag corresponding to MAC-VRF1. - Upon receiving the packet, the NVE1 uses VLAN-tag to identify the MAC-VRF1. It then looks up the MAC DA and forwards the frame to its IRB interface. - The Ethernet header of the packet is stripped and the packet is fed to the IP-VRF where IP lookup is performed on the destination address. This lookup yields an outgoing interface and the required encapsulation. If the encapsulation is for Ethernet NVO tunnel, then it includes a MAC address to be used as inner MAC DA, an IP address to be used as VTEP DA, and a VPN-ID to be used as VNID. The inner MAC SA and VTEP SA is set to NVE's MAC and IP addresses respectively. If it is a MPLS encapsulation, then corresponding EVPN and LSP labels are added to the packet. The packet is then forwarded to the egress NVE. - On the egress NVE, if the packet arrives on Ethernet NVO tunnel (e.g., it is VxLAN encapsulated), then the VxLAN header is removed. Since the inner MAC DA is the egress NVE's MAC address, the egress NVE knows that it needs to perform an IP lookup. It uses VNID to identify the IP-VRF table. If the packet is MPLS encapsulated, then the EVPN label lookup identifies the IP-VRF table. Next, an IP lookup is performed for the destination TS (TS3) which results in access- facing IRB interface over which the packet is sent. Before sending the packet over this interface, the ARP table is consulted to get the destination TS's MAC address. - The IP packet is encapsulated with an Ethernet header with MAC SA set to that of IRB interface MAC address and MAC DA set to that of destination TS (TS3) MAC address. The packet is sent to the corresponding MAC-VRF3 and after a lookup of MAC DA, is forwarded to the destination TS (TS3) over the corresponding interface. Sajassi et al. Expires January 2, 2019 [Page 21] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 In this symmetric IRB scenario, inter-subnet traffic between NVEs will always use the IP-VRF VNID/MPLS label. For instance, traffic from TS2 to TS4 will be encapsulated by NVE1 using NVE2's IP-VRF VNID/MPLS label, as long as TS4's host IP is present in NVE1's IP- VRF. 5.1.3 TS Move Operation When a TS move from one NVE to other, it is important that the MAC mobility procedures are properly executed and the corresponding MAC- VRF and IP-VRF tables on all participating NVEs are updated. [EVPN] describes the MAC mobility procedures for L2-only services for both single-homed TS and multi-homed TS. This section describes the incremental procedures and BGP Extended Communities needed to handle the MAC mobility for a mixed of L2 and L3 connectivity (aka IRB). In order to place the emphasis on the differences between L2-only versus L2-and-L3 use cases, the incremental procedure is described for single-homed TS with the expectation that the reader can easily extrapolate multi-homed TS based on the procedures described in section 15 of [EVPN]. Lets consider TS1 in figure-6 above where it moves from NVE1 to NVE2. In such move, NVE2 discovers IP1/MAC1 of TS1 and realizes that it is a MAC move and it advertises a MAC/IP route per section 5.1.1 above with MAC Mobility Extended Community. Since NVE2 learns TS1's MAC/IP addresses locally, it updates its MAC- VRF1 and IP-VRF1 for TS1 with its local interface. If the local learning at NVE1 is performed using control or management planes, then these interactions serve as the trigger for NVE1 to withdraw the MAC and IP addresses associated with TS1. However, if the local learning at NVE1 is performed using data-plane learning, then the reception of the MAC/IP Advertisement route (for TS1) from NVE2 with MAC Mobility extended community serve as the trigger for NVE1 to withdraw the MAC and IP addresses associated with TS1. All other remote NVE devices upon receiving the MAC/IP advertisement route for TS1 from NVE2 with MAC Mobility extended community compare the sequence number in this advertisement with the one previously received. If the new sequence number is greater than the old one, then they update the MAC/IP addresses of TS1 in their corresponding MAC-VRFs and IP-VRFs to point to NVE2. Furthermore, upon receiving the MAC/IP withdraw for TS1 from NVE1, these remote PEs perform the cleanups for their BGP tables. Sajassi et al. Expires January 2, 2019 [Page 22] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 5.2 IRB forwarding on NVEs for Subnets behind Tenant Systems This section covers the symmetric IRB procedures for the scenario where some Tenant Systems (TS's) support one or more subnets and these TS's are associated with one or more NVEs. Therefore, besides the advertisement of MAC/IP addresses for each TS which can be multi- homed with All-Active redundancy mode, the associated NVE needs to also advertise the subnets statically configured on each TS. The main difference between this solution and the previous one is the additional advertisement corresponding to each subnet. These subnet advertisements are accomplished using EVPN IP Prefix route defined in [EVPN-PREFIX]. These subnet prefixes are advertised with the IP address of their associated TS (which is in overlay address space) as their next hop. The receiving NVEs perform recursive route resolution to resolve the subnet prefix with its associated ingress NVE so that they know which NVE to forward the packets to when they are destined for that subnet prefix. The advantage of this recursive route resolution is that when a TS moves from one NVE to another, there is no need to re-advertise any of the subnet prefixes for that TS. All it is needed is to advertise the IP/MAC addresses associated with the TS itself and exercise MAC mobility procedures for that TS. The recursive route resolution automatically takes care of the updates for the subnet prefixes of that TS. Figure below illustrates this scenario where a given tenant (e.g., an IP-VPN service) has three subnets represented by MAC-VRF1, MAC-VRF2, and MAC-VRF3 across two NVEs. There are four TS's associated with these three MAC-VRFs - i.e., TS1, TS5 are connected to MAC-VRF1 on NVE1, TS2 is connected to MAC-VRF2 on NVE1, TS3 is connected to MAC- VRF3 on NVE2, and TS4 is connected to MAC-VRF1 on NVE2. TS1 has two subnet prefixes (SN1 and SN2) and TS3 has a single subnet prefix, SN3. The MAC-VRFs on each NVE are associated with their corresponding IP-VRF using their IRB interfaces. When TS4 and TS1 exchange intra- subnet traffic, only L2 forwarding (bridging) part of the IRB solution is used (i.e., the traffic only goes through their MAC- VRFs); however, when TS3 wants to forward traffic to SN1 or SN2 sitting behind TS1 (inter-subnet traffic), then both bridging and routing parts of the IRB solution are exercised (i.e., the traffic goes through the corresponding MAC-VRFs and IP-VRFs). The following subsections describe the control and data planes operations for this IRB scenario in details. Sajassi et al. Expires January 2, 2019 [Page 23] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 NVE1 +----------+ SN1--+ +-------------+ | | |--TS1-----|(MAC- \ | | | SN2--+ IP1/M1 | VRF1) \ | | | | (IP-VRF)|---| | | / | | | TS2-----|(MAC- / | | MPLS/ | IP2/M2 | VRF2) | | VxLAN/ | +-------------+ | NVGRE | +-------------+ | | SN3--+--TS3-----|(MAC-\ | | | IP3/M3 | VRF3)\ | | | | (IP-VRF)|---| | | / | | | TS4-----|(MAC- / | | | IP4/M4 | VRF1) | | | +-------------+ +----------+ NVE2 Figure 7: IRB forwarding on NVEs for Tenant Systems with configured subnets 5.2.1 Control Plane Operation Each NVE advertises a Route Type-5 (RT-5, IP Prefix Route defined in [EVPN-PREFIX]) for each of its subnet prefixes with the IP address of its TS as the next hop (gateway address field) as follow: - RD associated with the IP-VRF - ESI = 0 - Ethernet Tag = 0; - IP Prefix Length = 32 or 128 - IP Prefix = SNi - Gateway Address = IPi; IP address of TS - Label = 0 This RT-5 is advertised with one or more Route Targets that have been configured as "export route targets" of the IP-VRF from which the route is originated. Each NVE also advertises an RT-2 (MAC/IP Advertisement Route) along with their associated Route Targets and Extended Communities for each of its TS's exactly as described in section 5.1.1. Upon receiving the RT-5 advertisement, the receiving NVE performs the following: - It uses the Route Target to identify the corresponding IP-VRF Sajassi et al. Expires January 2, 2019 [Page 24] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 - It imports the IP prefix into its corresponding IP-VRF that is configured with an import RT that is one of the RTs being carried by the RT-5 route along with the IP address of the associated TS as its next hop. When receiving the RT-2 advertisement, the receiving NVE imports MAC/IP addresses of the TS into the corresponding MAC-VRF and IP-VRF per section 5.1.1. When both routes exist, recursive route resolution is performed to resolve the IP prefix (received in RT-5) to its corresponding NVE's IP address (e.g., its BGP next hop). BGP next hop will be used as underlay tunnel destination address (e.g., VTEP DA for VxLAN encapsulation) and Router's MAC will be used as inner MAC for VxLAN encapsulation. 5.2.2 Data Plane Operation The following description of the data-plane operation describes just the logical functions and the actual implementation may differ. Lets consider data-plane operation when a host on SN1 sitting behind TS1 wants to send traffic to a host sitting behind SN3 behind TS3. - TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB interface of NVE1, and VLAN-tag corresponding to MAC-VRF1. - Upon receiving the packet, the ingress NVE1 uses VLAN-tag to identify the MAC-VRF1. It then looks up the MAC DA and forwards the frame to its IRB interface just like section 5.1.1. - The Ethernet header of the packet is stripped and the packet is fed to the IP-VRF; where, IP lookup is performed on the destination address. This lookup yields the fields needed for VxLAN encapsulation with NVE2's MAC address as the inner MAC DA, NVE'2 IP address as the VTEP DA, and the VNID. MAC SA is set to NVE1's MAC address and VTEP SA is set to NVE1's IP address. - The packet is then encapsulated with the proper header based on the above info and is forwarded to the egress NVE (NVE2). - On the egress NVE (NVE2), assuming the packet is VxLAN encapsulated, the VxLAN and the inner Ethernet headers are removed and the resultant IP packet is fed to the IP-VRF associated with that the VNID. - Next, a lookup is performed based on IP DA (which is in SN3) in the associated IP-VRF of NVE2. The IP lookup yields the access-facing IRB interface over which the packet needs to be sent. Before sending the packet over this interface, the ARP table is consulted to get the Sajassi et al. Expires January 2, 2019 [Page 25] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 destination TS (TS3) MAC address. - The IP packet is encapsulated with an Ethernet header with the MAC SA set to that of the access-facing IRB interface of the egress NVE (NVE2) and the MAC DA is set to that of destination TS (TS3) MAC address. The packet is sent to the corresponding MAC-VRF3 and after a lookup of MAC DA, is forwarded to the destination TS (TS3) over the corresponding interface. 6 Inter-Subnet DCI Scenarios The inter-subnet DCI scenarios can be categorized into the following four categories. The last two scenarios, along with its corresponding solution, are described in [EVPN-IPVPN-INTEROP]. The first two scenarios are covered in this document. 1. Switching among IP subnets in different DCs using EVPN without GW 2. Switching among IP subnets in different DCs using EVPN with GW 3. Switching among IP subnets spread across IP-VPN and EVPN networks with GW 4. Switching among IP subnets spread across IP-VPN and EVPN networks without GW In the above scenario, the term "GW" refers to the case where a node situated at the WAN edge of the data center network behaves as a default gateway (GW) for all the destinations that are outside the data center. The absence of GW refers to the scenario where NVEs within a data center maintain individual (host) routes that are outside of the data center. In the case (3), the WAN edge node also performs route aggregation for all the destinations within its own data center, and acts as an interworking unit between EVPN and IP VPN (it implements both EVPN and IP-VPN functionality). Sajassi et al. Expires January 2, 2019 [Page 26] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 +---+ Enterprise Site 1 |PE1|----- H1 +---+ / ,---------. Enterprise Site 2 ,' `. +---+ ,---------. /( MPLS/IP )---|PE2|----- H2 ' DCN 3 `./ `. Core ,' +---+ `-+------+' `-+------+' __/__ / / \ \ :NVE4 : +---+ \ \ '-----' ,----|GW |. \ \ | ,' +---+ `. ,---------. TS6 ( DCN 1 ) ,' `. `. ,' ( DCN 2 ) `-+------+' `. ,' __/__ `-+------+' :NVE1 : __/__ __\__ '-----' :NVE2 : :NVE3 : | | '-----' '-----' TS1 TS2 | | | TS3 TS4 TS5 Figure 8: Interoperability Use-Cases In what follows, we will describe scenarios 1 and 2 in more details. 6.1 Switching among IP subnets in different DCs without GW This case is similar to that of section 2.1 above albeit for the fact that the TS's belong to different data centers that are interconnected over a WAN (e.g. MPLS/IP PSN). The data centers in question here are seamlessly interconnected to the WAN, i.e., the WAN edge devices do not maintain any TS-specific addresses in the forwarding path - e.g., there is no WAN edge GW(s) between these DCs. As an example, consider TS3 and TS6 of Figure 2 above. Assume that connectivity is required between these two TS's where TS3 belongs to the SN3 whereas TS6 belongs to the SN6. NVE2 has an EVI3 associated with SN3 and NVE4 has an EVI6 associated with the SN6. Both SN3 and SN6 are part of the same IP-VRF. When an EVPN MAC advertisement route is received by a NVE, the IP address associated with the route is used to populate the IP-VRF table, whereas the MAC address associated with the route is used to populate both the MAC-VRF table, as well as the adjacency associated with the IP route in the IP-VRF table (i.e., ARP table). Sajassi et al. Expires January 2, 2019 [Page 27] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 When an Ethernet frame is received by an ingress NVE, it performs a lookup on the destination MAC address in the associated EVI. If the MAC address corresponds to its IRB Interface MAC address, the ingress NVE deduces that the packet MUST be inter-subnet routed. Hence, the ingress NVE performs an IP lookup in the associated IP-VRF table. The lookup identifies an adjacency that contains a MAC rewrite and in turn the next-hop (i.e. egress) Gateway to which the packet must be forwarded along with the associated MPLS label stack. The MAC rewrite holds the MAC address associated with the destination host (as populated by the EVPN MAC route), instead of the MAC address of the next-hop Gateway. The ingress NVE then rewrites the destination MAC address in the packet with the address specified in the adjacency. It also rewrites the source MAC address with its IRB Interface MAC address for the destination subnet. The ingress NVE, then, forwards the frame to the next-hop (i.e. egress) Gateway after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as an EVPN label. The EVPN label could be either advertised by the ingress Gateway, if inter-AS option B is used, or advertised by the egress NVE, if inter-AS option C is used. When the MPLS encapsulated packet is received by the ingress Gateway, the processing again differs depending on whether inter-AS option B or option C is employed: in the former case, the ingress Gateway swaps the EVPN label in the packets with the EVPN label value received from the egress Gateway. In the latter case, the ingress Gateway does not modify the EVPN label and performs normal label switching on the LSP label. Similarly on the egress Gateway, for option B, the egress Gateway swaps the EVPN label with the value advertised by the egress NVE. Whereas, for option C, the egress Gateway does not modify the EVPN label, and performs normal label switching on the LSP label. When the MPLS encapsulated packet is received by the egress NVE, it uses the EVPN label to identify the bridge-domain table. It then performs a MAC lookup in that table, which yields the outbound interface to which the Ethernet frame must be forwarded. Figure 3 below depicts the packet flow. Sajassi et al. Expires January 2, 2019 [Page 28] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 NVE1 ASBR1 ASBR2 NVE2 +------------+ +------------+ +------------+ +------------+ | | | | | | | | |(MAC - (IP | | [LS] | | [LS] | |(IP - (MAC | | VRF) VRF)| | | | | | VRF) VRF)| | | | | | | | | | | | | | | | | +------------+ +------------+ +------------+ +------------+ ^ v ^ V ^ V ^ V | | | | | | | | TS1->-+ +-->--------+ +------------+ +---------------+ +->-TS2 Figure 9: Inter-Subnet Forwarding Among EVPN NVEs in Different DCs without GW 6.2 Switching among IP subnets in different DCs with GW In this scenario, connectivity is required between TS's in different data centers, and those hosts belong to different IP subnets. What makes this case different from that of Section 2.2 is that at least one of the data centers has a gateway as the WAN edge switch. Because of that, the NVE's IP-VRF within that data center need not maintain (host) routes to individual TS's outside of that data center. As an example, consider a tenant with TS1 and TS5 of Figure 2 above. Assume that connectivity is required between these two TS's where TS1 belongs to the SN1 whereas TS5 belongs to the SN5. NVE3 has an EVI5 associated with the SN5 and this EVI is represented by the MAC-VRF which is connected to the IP-VRF via an IRB interface. NVE1 has an EVI1 associated with the SN1 and this EVI is represented by the MAC- VRF which is connected to the IP-VRF representing the same tenant. Due to the gateway at the edge of DCN 1, NVE1's IP-VRF does not need to have the address of TS5 but instead it has a default route in its IP-VRF with the next-hop being the GW. In this scenario, the NVEs within a given data center do not have entries for the MAC/IP addresses of hosts in remote data centers. Rather, the NVEs have a default IP route pointing to the WAN gateway for each VRF. This is accomplished by the WAN gateway advertising for a given EVPN that spans multiple DC a default VPN-IP route that is imported by the NVEs of that VPN that are in the gateway's own DC. When an Ethernet frame is received by an ingress NVE, it performs a lookup on the destination MAC address in the associated MAC-VRF table. If the MAC address corresponds to the IRB Interface MAC address, the ingress NVE deduces that the packet MUST be inter-subnet Sajassi et al. Expires January 2, 2019 [Page 29] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 routed. Hence, the ingress NVE performs an IP lookup in the associated IP-VRF table. The lookup, in this case, matches the default host route which points to the local WAN gateway (GW1). The ingress NVE (NVE1) then rewrites the destination MAC address in the packet with the MAC address of core-facing IRB interface of GW1 (not shown in the figure) or it can rewrite it with the router's MAC address of GW1. It also rewrites the source MAC address with its own core-facing IRB Interface's MAC address for the destination subnet (i.e., the subnet between NVE1 and GW1) or it can rewrite it with its own router's MAC address of NVE1. The ingress NVE, then, forwards the frame to GW1 after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as the label for default host route that was advertised by the local WAN gateway. When the MPLS encapsulated packet is received by GW1, it uses the default host route MPLS label to identify the core-facing MAC-VRF. It does a MAC-DA lookup and forwards the packet to the IP-VRF after stripping the Ethernet header. It then performs an IP lookup in that table. The lookup identifies an adjacency that contains a MAC rewrite and in turn the remote WAN gateway (GW2) to which the packet must be forwarded along with the associated MPLS label stack. The MAC rewrite holds the MAC address associated with the ultimate destination host (as populated by the EVPN MAC route). GW1 then rewrites the destination MAC address in the packet with the address specified in the adjacency. It also rewrites the source MAC address with the MAC address of its core-facing IRB interface (not shown in the figure) or its router's MAC address. GW1, then, forwards the frame to the GW2 after encapsulating it with the MPLS label stack. Note that this label stack includes the LSP label as well as a EVPN label that was advertised by GW2. When the MPLS encapsulated packet is received by GW2, it uses the EVPN label to identify the destination MAC-VRF. It then performs a MAC-DA lookup and grabs the EVPN label advertised by NVE2 along with adjacencies info. It then encapsulates the packet with the corresponding label stack and forwards the packet to NVE2. It should be noted that no MAC header re-write is performed on GW2. This implies that both GW1 and GW2 need to keep the remote host MAC addresses along with the corresponding EVPN labels in their tables. The egress NVE (NVE2) then upon receiving the packet, performs a MAC lookup in the MAC-VRF (identified by the received EVPN label) to determine the outbound port to send the traffic on. Figure 4 below depicts the forwarding model. Sajassi et al. Expires January 2, 2019 [Page 30] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 NVE1 GW1 GW2 NVE2 +------------+ +------------+ +------------+ +------------+ | | | | | | | | |(MAC - (IP | |(IP - (MAC | | (MAC | |(IP - (MAC | | VRF) VRF)| | VRF) VRF)| | VRF) | | VRF) VRF)| | | | | | | | | | | | | | | | | +------------+ +------------+ +------------+ +------------+ ^ v ^ V ^ V ^ V | | | | | | | | TS1->-+ +-->-----+ +---------------+ +---------------+ +->-TS2 Figure 10: Inter-Subnet Forwarding Among EVPN NVEs in Different DCs with GW 7 TS Mobility 7.1 TS Mobility & Optimum Forwarding for TS Outbound Traffic Optimum forwarding for the TS outbound traffic, upon TS mobility, can be achieved using either the anycast default Gateway MAC and IP addresses, or using the address aliasing as discussed in [DC- MOBILITY]. 7.2 TS Mobility & Optimum Forwarding for TS Inbound Traffic For optimum forwarding of the TS inbound traffic, upon TS mobility, all the NVEs and/or IP-VPN PEs need to know the up to date location of the TS. Two scenarios must be considered, as discussed next. In what follows, we use the following terminology: - source NVE refers to the NVE behind which the TS used to reside prior to the TS mobility event. - target NVE refers to the new NVE behind which the TS has moved after the mobility event. 7.2.1 Mobility without Route Aggregation In this scenario, when a target NVE detects that a MAC mobility event has occurred, it initiates the MAC mobility handshake in BGP as specified in section 5.1.3. The WAN Gateways, acting as ASBRs in this case, re-advertise the MAC route of the target NVE with the MAC Mobility extended community attribute unmodified. Because the WAN Gateway for a given data center re-advertises BGP routes received from the WAN into the data center, the source NVE will receive the Sajassi et al. Expires January 2, 2019 [Page 31] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 MAC Advertisement route of the target NVE (with the next hop attribute adjusted depending on which inter-AS option is employed). The source NVE will then withdraw its original MAC Advertisement route as a result of evaluating the Sequence Number field of the MAC Mobility extended community in the received MAC Advertisement route. This is per the procedures already defined in [EVPN]. 8 Acknowledgements The authors would like to thank Sami Boutros and Jeffrey Zhang for their valuable comments. 9 Security Considerations The security considerations discussed in [EVPN] apply to this document. 10 IANA Considerations IANA has allocated a new transitive extended community Type of 0x06 and Sub-Type of 0x03 for EVPN Router's MAC Extended Community. 11 References 11.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432, February, 2015. [TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation Attribute", draft-ietf-idr-tunnel-encaps-03, November 2016. [EVPN-PREFIX] Rabadan et al., "IP Prefix Advertisement in EVPN", draft-ietf-bess-evpn-prefix-advertisement-03, September, 2016. 11.2 Informative References [RFC7606] Chen, E., Scudder, J., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP UPDATE Messages", RFC 7606, August 2015, . Sajassi et al. Expires January 2, 2019 [Page 32] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 [802.1Q] "IEEE Standard for Local and metropolitan area networks - Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.1Q(tm), 2014 Edition, November 2014. [EVPN-IPVPN-INTEROP] Sajassi et al., "EVPN Seamless Interoperability with IP-VPN", draft-sajassi-l2vpn-evpn-ipvpn-interop-01, work in progress, October, 2012. [DC-MOBILITY] Aggarwal et al., "Data Center Mobility based on BGP/MPLS, IP Routing and NHRP", draft-raggarwa-data-center-mobility- 05.txt, work in progress, June, 2013. 12 Contributors In addition to the authors listed on the front page, the following co-authors have also contributed to this document: Florin Balus Cisco Yakov Rekhter Juniper Wim Henderickx Nokia Linda Dunbar Huawei Dennis Cai Alibaba Authors' Addresses Ali Sajassi (Editor) Cisco Email: sajassi@cisco.com Samer Salam Cisco Email: sslam@cisco.com Samir Thoria Cisco Email: sthoria@cisco.com Sajassi et al. Expires January 2, 2019 [Page 33] INTERNET DRAFT Integrated Routing & Bridging in EVPN July 2, 2018 John E. Drake Juniper Networks Email: jdrake@juniper.net Lucy Yong Huawei Technologies Email: lucy.yong@huawei.com Jorge Rabadan Nokia Email: jorge.rabadan@nokia.com Sajassi et al. Expires January 2, 2019 [Page 34]