L2VPN Workgroup J. Rabadan Internet Draft W. Henderickx S. Palislamovic Intended status: Standards Track Alcatel-Lucent F. Balus Nuage Networks A. Isaac Bloomberg Expires: January 16, 2014 July 15, 2013 IP Prefix Advertisement in E-VPN draft-rabadan-l2vpn-evpn-prefix-advertisement-00 Abstract E-VPN provides a flexible control plane that allows intra-subnet connectivity in an IP/MPLS and/or an NVO-based network. In Data Centers, there is also a need for a dynamic and efficient inter- subnet connectivity across Tenant Systems and End Devices that can be physical or virtual and may not support their own routing protocols. This document defines a new E-VPN route type for the advertisement of IP Prefixes and explains how E-VPN should be used to provide inter-subnet connectivity with the flexibility required by the Data Center applications. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at Rabadan et al. Expires January 16, 2014 [Page 1] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 16, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction and problem statement . . . . . . . . . . . . . . 3 1.1 Inter-subnet connectivity requirements in Data Centers . . . 3 1.2 The requirement for advertising IP prefixes in E-VPN . . . . 5 1.3 The requirement for a new E-VPN route type . . . . . . . . . 6 2. The BGP E-VPN IP Prefix route . . . . . . . . . . . . . . . . . 8 2.1. IP Prefix Route encoding . . . . . . . . . . . . . . . . . 9 2.2. BGP remote-next-hop attribute . . . . . . . . . . . . . . . 9 3. Procedures associated to the advertisement of IP Prefixes . . . 10 3.1. Usage of the MAC advertisement and IP Prefix advertisement routes . . . . . . . . . . . . . . . . . . . 10 3.2. Inter-subnet connectivity for TS . . . . . . . . . . . . . 11 3.3. Inter-subnet connectivity for redundant TS (floating IP) . 13 3.4. Inter-subnet connectivity for IRB interfaces . . . . . . . 15 3.4.1. Inter-subnet connectivity for unnumbered IRB interfaces . . . . . . . . . . . . . . . . . . . . . . 17 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5. Conventions used in this document . . . . . . . . . . . . . . . 20 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 20 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 20 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8.1. Normative References . . . . . . . . . . . . . . . . . . . 20 8.2. Informative References . . . . . . . . . . . . . . . . . . 20 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 20 10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21 Rabadan et al. Expires January 16, 2014 [Page 2] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 1. Introduction and problem statement Inter-subnet connectivity is required within the Data Center, therefore IP Prefixes must be advertised in the control plane. This section explains why IP-VPN [RFC4364] procedures cannot be used for such advertisements and why the existing E-VPN MAC route type does not meet the Data Center requirements for the advertisement of IP Prefixes, hence a new E-VPN route type is proposed. Section 1.1 describes the inter-subnet connectivity requirements in Data Centers. Section 1.2 and 1.3 explain why neither IP-VPN nor the existing E-VPN route types meet the requirements for IP Prefix advertisements. Once the need for a new E-VPN route type is justified, sections 2 and 3 will describe this route type and how it is used in some specific use cases. 1.1 Inter-subnet connectivity requirements in Data Centers [E-VPN] is used as the control plane for a Network Virtualization Overlay (NVO3) solution in Data Centers (DC), where Network Virtualization Edge (NVE) devices can be located in Hypervisors or TORs, as described in [E-VPN-OVERLAYS]. If we use the term Tenant System (TS) to designate a physical or virtual system identified by MAC and IP addresses, and connected to an E-VPN instance, the following considerations apply: o The Tenant Systems may be Virtual Machines (VMs) that generate traffic from their own MAC and IP. o The Tenant Systems may be Virtual Appliance entities (VAs) that forward traffic to/from IP addresses of different End Devices seating behind them. o These VAs can be firewalls, load balancers, NAT devices, other appliances or virtual gateways with virtual routing instances. o These VAs do not have their own routing protocols and hence rely on the E-VPN NVEs to advertise the routes on their behalf. o In all these cases, the VA will forward traffic to the Data Center using its own source MAC but the source IP will be the one associated to the End Device seating behind or a translated IP address (part of a public NAT pool) if the VA is performing NAT. o Note that the same IP address could exist behind two of these Rabadan et al. Expires January 16, 2014 [Page 3] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 TS. One example of this would be certain appliance resiliency mechanisms, where a virtual IP or floating IP can be own by one of the two VAs running the resiliency protocol (the master VA). VRRP is one particular example of this. Another example is multi-homed subnets, i.e. the same subnet is connected to two VAs. The following figure illustrates some of the examples described above. NVE1 +--------+ TS1(VM)--|(EVI-10)|---------+ IP1/M1 +--------+ | DGW1 +---------+ +-------------+ | |----|(EVI-10) | SN1---+ NVE2 | | | IRB1\ | | +--------+ | | | (VRF)|---+ SN2---TS2(VA)--|(EVI-10)|----| | +-------------+ _|_ | IP2/M2 +--------+ | VXLAN/ | ( ) IP4---+ <-+ | nvGRE | DGW2 ( WAN ) | | | +-------------+ (___) vIP23 (floating) | |----|(EVI-10) | | | +---------+ | IRB2\ | | SN1---+ <-+ NVE3 | | | (VRF)|---+ | IP3/M3 +--------+ | | +-------------+ SN3---TS3(VA)--|(EVI-10)|------+ | | +--------+ | IP5---+ | | NVE4 | +---------------------+ | IP6------|(EVI-1) | | | \ IRB3 | | | (VRF)-(EVI-10)|--+ | / | |---|(EVI-2) | SN4| +---------------------+ Figure 1 DC inter-subnet use-cases Where: NVE1, NVE2, NVE3, NVE4, DGW1 and DGW2 share the same E-VPN for a particular tenant. EVI-10 is the corresponding E-VPN instance on each element, and all the hosts connected to that instance belong to the same IP subnet. The hosts connected to E-VPN 10 are listed below: Rabadan et al. Expires January 16, 2014 [Page 4] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 o TS1 is a VM that generates/receives traffic from/to IP1, where IP1 belongs to the E-VPN 10 subnet. o TS2 and TS3 are Virtual Appliances (VA) that generate/receive traffic from/to the subnets and hosts seating behind them (SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3) belong to the E-VPN subnet and they can also generate/receive traffic. When these VAs receive packets destined to their own MAC addresses (M2 and M3) they will route the packets to the proper subnet or host. These VAs do not support routing protocols to advertise the subnets connected to them and can move to a different server and NVE when the Cloud Management System decides to do so. These VAs may also support redundancy mechanisms for some subnets, similar to VRRP, where a floating IP is owned by the master VA and only the master VA forwards traffic to a given subnet. E.g.: vIP23 in figure 1 is a floating IP that can be owned by TS2 or TS3 depending on who the master is. Only the master will forward traffic to SN1. o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have their own IP addresses that belong to the E-VPN 10 subnet too. These IRB interfaces connect the E-VPN 10 subnet to Virtual Routing and Forwarding (VRF) instances that can route the traffic to other connected subnets for the same tenant (within the DC or at the other end of the WAN). In some occasions, the IRB interfaces do not terminate IP traffic themselves and therefore they do not need any IP address configured. In such case, we will refer to these special IRB interfaces as "unnumbered" IRB interfaces. All the above DC use cases use individual IP hosts and subnets for intra/inter connectivity. Therefore, their IP addresses MUST be advertised: a) From the NVEs (since VAs and VMs do not run routing protocols) and b) Associated to a next-hop that can be a VA IP address, a floating IP address, and IRB IP address or a MAC address. 1.2 The requirement for advertising IP prefixes in E-VPN In all the inter-subnet connectivity cases discussed in section 1.1 there is a need to advertise IP prefixes in the control plane that cannot be satisfied by using [RFC4364] due to the following requirements, specific to NVO-based Data Centers: o The data plane in NVO-based Data Centers is not based on IP over a GRE or MPLS tunnel as required by [RFC4364], but Ethernet over an IP tunnel, such as VXLAN or NVGRE. Rabadan et al. Expires January 16, 2014 [Page 5] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 o The IP prefixes in the DC must be advertised with a flexibility that does not exist in IP-VPNs. For instance: a) The advertised next-hop for a given IP prefix can be an IRB IP address (see section 3.4), a floating IP address (see section 3.3) or even a MAC address (see section 3.4.1). In the future, the ESI could also be defined as a next-hop for the advertised prefixes. b) As stated by [E-VPN-OVERLAYS], VXLAN or NVGRE virtual identifiers can have a global or a local scope. The implementation MUST support the flexibility to advertise IP Prefixes associated to a global identifier (32-bit value encoded in the E-VPN Ethernet Tag ID) or a locally significant identifier (20-bit value encoded in the MPLS label field). At the moment, [RFC4364] can only advertise Prefixes associated to a locally significant identifier (MPLS label). o IP prefixes must be advertised by NVE devices that have no VRF instances defined and no capability to process IP-VPN prefixes. These NVE devices just support E-VPN and advertise IP Prefixes on behalf of some connected Tenant Systems. In other words: any attempt to solve this problem by simply using [RFC4364] routes requires that any EVPN deployment must be accompanied with a concurrent IP-VPN topology, which is not possible in most of the cases. o Finally, Data Center providers want to use a single BGP Subsequent Address Family (AFI/SAFI) for the advertisement of addresses within the Data Center, i.e. BGP E-VPN only, as opposed to using E-VPN and IP-VPN in a concurrent topology. This minimizes the control plane overhead in TORs and Hypervisors and simplifies the operations. E-VPN is extended - as described in this document - to advertise IP prefixes with the flexibility required by the current and future Data Center applications. 1.3 The requirement for a new E-VPN route type [E-VPN] defines a MAC route (or route type 2) where a MAC address can be advertised together with an IP address length (IPL) and IP address (IP). While a variable IPL might be used to indicate the presence of an IP prefix in a route type 2, there are several specific use cases in which using this route type to deliver IP Prefixes is not suitable. Rabadan et al. Expires January 16, 2014 [Page 6] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 One example of such use cases is the "floating IP" example described in section 1.1. In this example we need to decouple the advertisement of the prefixes from the advertisement of the floating IP (vIP23 in figure 1) and MAC associated to it, otherwise the solution gets highly inefficient and does not scale. E.g.: if we are advertising 1k prefixes from M2 (using route type 2) and the floating IP owner changes from M2 to M3, we would need to withdraw 1k routes from M2 and re-advertise 1k routes from M3. However if we use a separate route type, we can advertise the 1k routes associated to the floating IP address (vIP23) and only one route type 2 for advertising the ownership of the floating IP, i.e. vIP23 and M2 in the route type 2. When the floating IP owner changes from M2 to M3, a single route type 2 withdraw/update is required to indicate the change. The remote DGW will not change any of the 1k prefixes associated to vIP23, but will only update the ARP resolution entry for vIP23 (now pointing at M3). Any other attempt to improve the efficiency of the solution when using non-MAC-decoupled Prefix advertisements, will derive in dependencies on the Cloud Management System (if ESIs are to be used) and changes in the current E-VPN semantics. The DC applications require mechanisms to provide IP Prefix resiliency independent of the E-VPN procedures. Other reasons to decouple the IP Prefix advertisement from the MAC route are listed below: o Clean identification, operation of troubleshooting of IP Prefixes, not subject to interpretation and independent of the IPL and the IP value. E.g.: An IP address for ARP resolution must be always clearly distinguished from an /32 IP Prefix, or a default IP route 0.0.0.0/0 must always be easily and clearly distinguished from the absence of IP information. o MAC address information must not be compared by BGP when selecting two IP Prefix routes. If IP Prefixes are to be advertised using MAC routes, the MAC information is always present and part of the route key. o IP Prefix routes must not be subject to MAC route procedures such as MAC Mobility or aliasing. Prefixes advertised from two different ESIs do not mean mobility; MACs advertised from two different ESIs do mean mobility. Similarly load balancing for IP prefixes is achieved through IP mechanisms such as ECMP, and not through MAC route mechanisms such as aliasing. o NVEs that do not require processing IP Prefixes must have an Rabadan et al. Expires January 16, 2014 [Page 7] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 easy way to identify an update with an IP Prefix and ignore it, rather than processing the MAC route only to find out later that it carries a Prefix that must be ignored. The following sections describe how E-VPN is extended with a new route type for the advertisement of prefixes and how this route is used to address the current and future inter-subnet connectivity requirements existing in the Data Center. 2. The BGP E-VPN IP Prefix route The current BGP E-VPN NLRI as defined in [E-VPN] is shown below: +-----------------------------------+ | Route Type (1 octet) | +-----------------------------------+ | Length (1 octet) | +-----------------------------------+ | Route Type specific (variable) | +-----------------------------------+ Where the route type field can contain one of the following specific values: + 1 - Ethernet Auto-Discovery (A-D) route + 2 - MAC advertisement route + 3 - Inclusive Multicast Route + 4 - Ethernet Segment Route This document defines an additional route type that will be used for the advertisement of IP Prefixes: + 5 - IP Prefix Route The support for this new route type is OPTIONAL. By using a separate route type for IP prefix advertisements, there is a clean separation of functions between route types, i.e. route type 2 or MAC Advertisement route will be used for MAC and ARP resolution advertisement, whereas route type 5 or IP Prefix route will be used for the advertisement of prefixes. Since this new route type is OPTIONAL, an implementation not supporting it will easily ignore the route, based on the route type value. The detailed encoding of this route and associated procedures are Rabadan et al. Expires January 16, 2014 [Page 8] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 described in the following sections. 2.1. IP Prefix Route encoding An IP Prefix advertisement route type specific E-VPN NLRI consists of the following fields: +---------------------------------------+ | RD (8 octets) | +---------------------------------------+ |Ethernet Segment Identifier (10 octets)| +---------------------------------------+ | Ethernet Tag ID (4 octets) | +---------------------------------------+ | IP Address Length (1 octet) | +---------------------------------------+ | IP Address (4 or 16 octets) | +---------------------------------------+ | MPLS Label (3 octets) | +---------------------------------------+ Where: o RD, Ethernet Tag ID and MPLS Label fields will be used as defined in [E-VPN] and [E-VPN-OVERLAYS]. o The Ethernet Segment Identifier will be zero for IP prefix advertisements in this version of the document, and be re-used in the future for other purposes. o The IP address length can be set to a value between 0 and 32 (bits) for ipv4 and between 0 and 128 for ipv6. o The IP address will be a 32 or 128-bit field (ipv4 or ipv6). o The total route length will indicate the type of prefix (ipv4 or ipv6). The Eth-Tag ID, IP address length and IP address will be part of the route key used by BGP to compare routes. The rest of the fields will be out of the route key. 2.2. BGP remote-next-hop attribute The BGP remote-next-hop attribute [BGP-REMOTE-NH] will be sent along with the IP Prefix advertisement to indicate the next-hop behind which the advertised prefix is located. The following table shows the different types of next-hops defined in this document and their Rabadan et al. Expires January 16, 2014 [Page 9] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 corresponding encoding in the BGP remote-next-hop attribute. +--------------------+----------------------------------+ | Prefix next-hop | Field in the remote-nh attribute | +--------------------+----------------------------------+ | MAC address | sub-TLV (for VXLAN or NVGRE) | | IRB IP address | tunnel address (ipv4 or ipv6) | | Floating IP address| tunnel address (ipv4 or ipv6) | +--------------------+----------------------------------+ 3. Procedures associated to the advertisement of IP Prefixes This section describes the separate function of each E-VPN advertisement route: route type 2 for MAC/IP advertisements and route type 5 for IP Prefixes. After defining the role of each route type and the benefits of using a separate route for IP Prefixes, the procedures associated to the advertisement of prefixes will be explained in three different use cases. 3.1. Usage of the MAC advertisement and IP Prefix advertisement routes [E-VPN] describes the content of the BGP E-VPN route type 2 specific NLRI, i.e. MAC Advertisement Route, where the IP address length (IPL) and IP address (IP) of a specific advertised MAC are encoded. The subject of the MAC advertisement route is the MAC address (M) and MAC address length (ML) encoded in the route. The MAC mobility and other complex procedures are defined around that MAC address. The IP address information carries the host IP address required for the ARP resolution of the MAC. The BGP E-VPN route type 5 defined in this document, i.e. IP Prefix Advertisement route, decouples the advertisement of IP prefixes from the advertisement of any MAC address related to it. This brings some major benefits to NVO-based networks where inter-subnet forwarding is required. Some of those benefits are: a) Upon receiving a route type 2 or type 5, an egress NVE can easily distinguish MACs and IPs for ARP resolution from IP Prefixes. E.g. an IP prefix with IPL=32 being advertised from two different ingress NVEs (as route type 5) can be identified as such and be imported in the designated routing context as two ECMP routes, as opposed to two ARP entries competing for the same IP. b) Similarly, upon receiving a route, an egress NVE not supporting processing IP Prefixes can easily ignore the update, based on the route type. Rabadan et al. Expires January 16, 2014 [Page 10] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 c) A MAC route includes the ML, M, IPL and IP in the route key that is used by BGP to compare routes. Advertised IP Prefixes are imported into the designated routing context, where there is no MAC information associated to IP routes. In the example illustrated in figure 1, subnet SN1 should be advertised by NVE2 and NVE3 and interpreted by DGW1 as the same route coming from two different next-hops, regardless of the MAC address associated to TS2 or TS3. This is easily accomplished in the route type 5 by including only the IP information in the route key. d) By decoupling the MAC from the IP Prefix advertisement procedures, we can leave the IP prefix advertisements out of the MAC mobility procedures defined in [E-VPN] for MACs. In addition, this allows us to have an indirection mechanism for IP prefixes advertised from a MAC/IP that can move between hypervisors. E.g. if there are 1,000 prefixes seating behind TS2 (figure 1), NVE2 will advertise all those prefixes in type 5 routes associated to the next-hop IP2. Should TS2 move to a different NVE, a single MAC advertisement route withdraw for the M2/IP2 route from NVE2 will invalidate the 1,000 prefixes, as opposed to have to wait for each individual prefix to be withdrawn. This may be easily accomplished by using a different IP Prefix route type that is not tied to a MAC address. 3.2. Inter-subnet connectivity for TS The following figure illustrates an example of inter-subnet forwarding for subnets seating behind Virtual Appliances (on TS2 and TS3). SN1---+ NVE2 DGW1 | +--------+ +---------+ +-------------+ SN2---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | | IP2/M2 +--------+ | | | IRB1\ | IP4---+ | | | (VRF)|---+ | | +-------------+ _|_ | VXLAN/ | ( ) | nvGRE | DGW2 ( WAN ) SN1---+ NVE3 | | +-------------+ (___) | IP3/M3 +--------+ | |----|(EVI-10) | | SN3---TS3(VA)--|(EVI-10)|----| | | IRB2\ | | | +--------+ +---------+ | (VRF)|---+ IP5---+ +-------------+ Figure 2 Inter-subnet forwarding for TS Rabadan et al. Expires January 16, 2014 [Page 11] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 An example of inter-subnet forwarding between subnet SN1/24 and a subnet seating in the WAN is described below. NVE2, NVE3, DGW1 and DGW2 are running BGP E-VPN. TS2 and TS3 do not support routing protocols, only a static route to forward the traffic to the WAN. (1) NVE2 advertises the following BGP routes on behalf of TS2: o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32, IP=IP2 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, remote-nh tunnel address=IP2 (2) NVE3 advertises the following BGP routes on behalf of TS3: o Route type 2 (MAC route) containing: ML=48, M=M3, IPL=32, IP=IP3 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, remote-nh tunnel address=IP3 (3) DGW1 and DGW2 import both received routes based on the RT: o Based on the EVI-10 route-target in DGW1 and DGW2, the MAC route is imported and M2 is added to the EVI-10 MAC FIB along with its corresponding tunnel information. For the VXLAN use case, the VTEP will be derived from the MAC route BGP next-hop and VNI from the Ethernet Tag or MPLS fields (see [E-VPN- OVERLAYS]). IP2 - M2 is added to the ARP table. o Based on the EVI-10 route-target in DGW1 and DGW2, the IP Prefix route is also imported and SN1/24 is added to the designated routing context with next-hop IP2 pointing at the local EVI-10. Should ECMP be enabled in the routing context, SN1/24 would also be added to the routing table with next-hop IP3. (4) When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24: o A destination IP lookup is performed on the DGW1 VRF routing table and next-hop=IP2 is found. The tunnel information to encapsulate the packet will be derived from the route-type 2 (MAC route) received for M2/IP2. o IP2 is resolved to M2 in the ARP table, and M2 is resolved to the tunnel information given by the MAC FIB (remote VTEP and VNI for the VXLAN case). Rabadan et al. Expires January 16, 2014 [Page 12] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 o The IP packet destined to IPx is encapsulated with: . Source inner MAC = IRB1 MAC . Destination inner MAC = M2 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs and MACs for the VXLAN case) (5) When the packet arrives at NVE2: o Based on the tunnel information (VNI for the VXLAN case), the EVI-10 context is identified for a MAC lookup. o Encapsulation is stripped-off and based on a MAC lookup (assuming MAC forwarding on the egress NVE), the packet is forwarded to TS2, where it will be properly routed. (5) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will be applied to the MAC route IP2/M2, as defined in [EVPN]. Route type 5 prefixes are not subject to MAC mobility procedures, hence no changes in the DGW VRF routing table will occur for TS2 mobility, i.e. all the prefixes will still be pointing at IP2 as next-hop. There is an indirection for e.g. SN1/24, which still points at next-hop IP2 in the routing table, but IP2 will be simply resolved to a different tunnel, based on the outcome of the MAC mobility procedures for the MAC route IP2/M2. Note that in the opposite direction, TS2 will send traffic based on its static-route next-hop information (IRB1 and/or IRB2), and regular E-VPN procedures will be applied. 3.3. Inter-subnet connectivity for redundant TS (floating IP) Sometimes Tenant Systems (TS) work in active/standby mode where an upstream floating IP - owned by the active TS - is used as the next- hop to get to some subnets behind. This redundancy mode, alredy introduced in section 1.1 and 1.3, is illustrated in Figure 3. Rabadan et al. Expires January 16, 2014 [Page 13] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 NVE2 DGW1 +--------+ +---------+ +-------------+ +---TS2(VA)--|(EVI-10)|----| |----|(EVI-10) | | IP2/M2 +--------+ | | | IRB1\ | | <-+ | | | (VRF)|---+ | | | | +-------------+ _|_ SN1 vIP23 (floating) | VXLAN/ | ( ) | | | nvGRE | DGW2 ( WAN ) | <-+ NVE3 | | +-------------+ (___) | IP3/M3 +--------+ | |----|(EVI-10) | | +---TS3(VA)--|(EVI-10)|----| | | IRB2\ | | +--------+ +---------+ | (VRF)|---+ +-------------+ Figure 3 Inter-subnet forwarding for redundant TS In this example, assuming TS2 is the active TS and owns IP23: (1) NVE2 advertises the following BGP routes for TS2: o Route type 2 (MAC route) containing: ML=48, M=M2, IPL=32, IP=IP23 o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, remote-nh tunnel address=IP23 (2) NVE3 advertises the following BGP routes for TS3: o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, remote-nh tunnel address=IP23 (3) DGW1 and DGW2 import both received routes based on the RT: o M2 is added to the EVI-10 MAC FIB along with its corresponding tunnel information. For the VXLAN use case, the VTEP will be derived from the MAC route BGP next-hop and VNI from the Ethernet Tag or MPLS fields (see [E-VPN-OVERLAYS]). IP23 - M2 is added to the ARP table. o SN1/24 is added to the designated routing context in DGW1 and DGW2 with next-hop IP23 pointing at the local EVI-10. (4) When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24: o A destination IP lookup is performed on the DGW1 VRF routing table and next-hop=IP23 is found. The tunnel information to encapsulate the packet will be derived from the route-type 2 (MAC route) received for M2/IP23. Rabadan et al. Expires January 16, 2014 [Page 14] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 o IP23 is resolved to M2 in the ARP table, and M2 is resolved to the tunnel information given by the MAC FIB (remote VTEP and VNI for the VXLAN case). o The IP packet destined to IPx is encapsulated with: . Source inner MAC = IRB1 MAC . Destination inner MAC = M2 . Tunnel information provided by the MAC FIB (VNI, VTEP IPs and MACs for the VXLAN case) (5) When the packet arrives at NVE2: o Based on the tunnel information (VNI for the VXLAN case), the EVI-10 context is identified for a MAC lookup. o Encapsulation is stripped-off and based on a MAC lookup (assuming MAC forwarding on the egress NVE), the packet is forwarded to TS2, where it will be properly routed. (5) When the redundancy protocol running between TS2 and TS3 appoints TS3 as the new active TS for SN1, TS3 will now own the floating IP23 and will signal this new ownership (GARP message or similar). Upon receiving the new owner's notification, NVE3 will issue a route type 2 for M3-IP23. DGW1 and DGW2 will update their ARP tables with the new MAC resolving the floating IP. No changes are carried out in the VRF routing table. In the DGW1/2 BGP RIB, there will be two route type 5 routes for SN1 (from NVE2 and NVE3) but only the one with the same BGP next-hop as the IP23 route type 2 BGP next-hop will be valid. 3.4. Inter-subnet connectivity for IRB interfaces In some other cases, the NVEs and DGWs will have just IRB interfaces as hosts in the E-VPN instance. Figure 4 illustrates an example. Rabadan et al. Expires January 16, 2014 [Page 15] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 NVE1 +---------------------+ DGW1 IP1---|(EVI-1) | +-------------+ | \ IRB3 | +---------+ |(EVI-10) | | (VRF)-(EVI-10)|--| |--| IRB1\ | | / | | | | (VRF)|---+ |-|(EVI-2) | | | +-------------+ _|_ SN1| +---------------------+ | | ( ) | +---------------------+ | VXLAN/ | DGW2 ( WAN ) |-|(EVI-2) | | nvGRE | +-------------+ (___) | \ IRB4 | | | |(EVI-10) | | | (VRF)-(EVI-10)|--| |--| IRB2\ | | | / | +---------+ | (VRF)|---+ SN2---|(EVI-3) | +-------------+ +---------------------+ NVE2 Figure 4 Inter-subnet forwarding for IRB interfaces In this case: (1) NVE1 advertises the following BGP routes for SN1 resolution: o Route type 2 (MAC route) containing: ML=48, M=IRB3-MAC, IPL=32, IP=IRB3-IP o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, remote-nh tunnel address=IRB3-IP (2) NVE2 advertises the following BGP routes for SN1 resolution: o Route type 2 (MAC route) containing: ML=48, M=IRB4-MAC, IPL=32, IP=IRB4-IP o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1, remote-nh tunnel address=IRB4-IP (3) DGW1 and DGW2 import both received routes based on the RT: o IRB3-MAC and IRB4-MAC are added to the EVI-10 MAC FIB along with their corresponding tunnel information. For the VXLAN use case, the VTEP will be derived from the MAC route BGP next-hop and VNI from the Ethernet Tag or MPLS fields (see [E-VPN- OVERLAYS]). IRB3-MAC - IRB3-IP and IRB4-MAC - IRB4-IP are added to the ARP table. o SN1/24 is added to the designated routing context in DGW1 and DGW2 with next-hop IRB3-IP (and/or IRB4-IP) pointing at the Rabadan et al. Expires January 16, 2014 [Page 16] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 local EVI-10. Similar forwarding procedures as the ones described in the previous use-cases are followed. 3.4.1. Inter-subnet connectivity for unnumbered IRB interfaces In the previous example, the E-VPN instance can connect IRB interfaces and any other Tenant Systems connected to it. E-VPN provides connectivity for: a) Traffic destined to the IRB IP interfaces as well as b) Traffic destined to IP subnets seating behind the IRB interfaces, e.g. SN1 or SN2. In order to provide connectivity for (a) we need MAC routes (route- type 2) distributing IRB MACs and IPs. Connectivity type (b) is accomplished by the exchange of IP Prefix routes (route-type 5) for IPs and subnets seating behind IRBs. As discussed in this document, prefixes are advertised along with their corresponding remote next-hop tunnel address, and those tunnel addresses are used to link prefixes to MAC/IPs advertised in MAC routes (type 2). In some cases, connectivity type (a) (see above) is not required and the E-VPN instance is connecting only IRB interfaces, which are never the final destination of any packet. This use case is depicted in the diagram below and we refer to it as the "unnumbered IRB interface" use-case: NVE1 +------------+ IP1-----|(EVI-1) | DGW1 | \ | +---------+ +-----+ | (VRF)|----| |----|(VRF)|----+ | / | | | +-----+ | |---|(EVI-2) | | | _|_ | +------------+ | | ( ) SN1| | VXLAN/ | ( WAN ) | NVE2 | nvGRE | (___) | +------------+ | | | |---|(EVI-2) | | | DGW2 | | \ | | | +-----+ | | (VRF)|----| |----|(VRF)|----+ | / | +---------+ +-----+ SN2-----|(EVI-3) | +------------+ Figure 5 Inter-subnet forwarding for unnumbered IRB interfaces Rabadan et al. Expires January 16, 2014 [Page 17] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 In this case, we need to provide connectivity from/to IP hosts in SN1, SN2, IP1 and hosts seating at the other end of the WAN. The E-VPN in the core just connects all the IRBs in NVE1, NVE2, DGW1 and DGW2 but there will not be any IP host in this core E-VPN that is the final destination of any IP packet. Therefore there is no need to define IRB IP addresses (IRBs are not represented in the diagram). This is the reason why we refer to this solution as "unnumbered Ethernet IRB" solution. In this case, the proposal is to use EVPN type 5 routes and the BGP Remote-Next-Hop attribute, where the following information is carried: o Route type 5 Eth-Tag ID can contain the core instance VNI (if the VNI is global, otherwise, for local significant VNIs, an MPLS label field may be added with a 20-bit VNI encoded in the label space, as per [E-VPN-OVERLAYS]). o Route type 5 IP address length and IP address, as explained in the previous section. o Remote next-hop Tunnel Type is: TBD for VXLAN and TBD for NVGRE (TBD by IANA). o Remote next-hop Tunnel Address is populated with zeros, meaning that the prefix next-hop is an "unnumbered IRB". o Remote next-hop sub-TLV (for VXLAN/NVGRE) in the Tunnel Parameters field: contains the next-hop MAC address associated to the unnumbered IRB interface. This MAC address identifies the NVE/DGW and can be re-used for all the VRFs in the node. Example of prefix advertisement for the ipv4 prefix SN1/24 advertised from NVE1: (1) NVE1 advertises the following BGP route for SN1: o Route type 5 (IP Prefix route) containing: Eth-Tag=VNI=10 (assuming global VNI), IPL=24, IP=SN1. In addition to that, a Remote-NH attribute will be sent, where: Tunnel-type= VXLAN or NVGRE and a Sub-TLV will contain a MAC address= NVE1 MAC. o As discussed, no MAC route is advertised for this core evpn. (2) DGW1 imports the received route from NVE1 and SN1/24 is added to the designated routing context. The next-hop for SN1/24 will be given by the route type 5 BGP next-hop (NVE1), which is resolved to a Rabadan et al. Expires January 16, 2014 [Page 18] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 tunnel. For instance: if the tunnel is VXLAN based, the BGP next-hop will be resolved to a VXLAN tunnel where: destination-VTEP= NVE1 IP, VNI=10, inner destination MAC = NVE1 MAC (derived from the remote-nh attribute). (3) When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24: o A destination IP lookup is performed on the DGW1 VRF routing table and next-hop= "NVE1 IP" is found. The tunnel information to encapsulate the packet will be derived from the route-type 5 received for SN1. o The IP packet destined to IPx is encapsulated with: Source inner MAC = DGW1 MAC, Destination inner MAC = NVE1 MAC, Source outer IP (source VTEP) = DGW1 IP, Destination outer IP (destination VTEP) = NVE1 IP (4) When the packet arrives at NVE1: o Based on the tunnel information (VNI for the VXLAN case), the routing context is identified for an IP lookup. o An IP lookup is performed in the routing context, where SN1 turns out to be a local subnet associated to EVI-2. A subsequent lookup in the ARP table and the EVI-2 MAC FIB will return the forwarding information for the packet in EVI-2. 4. Conclusions A new E-VPN route type 5 for the advertisement of IP Prefixes is proposed in this document. This new route type will have a differentiated role from the route type 2, i.e. MAC advertisement route, and will address all the inter-subnet connectivity scenarios which are required in the Data Center. As discussed throughout the document, IP-VPN cannot be used in an NVO-based DC to advertise IP Prefixes and the existing E-VPN route type 2 does not meet the requirements for all the DC use cases, therefore a new E-VPN route type is required. This new E-VPN route type 5 decouples the IP Prefix advertisements from the MAC route advertisements in E-VPN, hence: a) Allows the clean and clear announcements of ipv4 or ipv6 prefixes in an NLRI with no MAC addresses in the route key, so that only IP information is used in BGP route comparisons. b) Since the route type is different from the MAC advertisement Rabadan et al. Expires January 16, 2014 [Page 19] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 route, the advertisement of prefixes will be excluded from all the procedures defined for the advertisement of VM MACs, e.g. MAC Mobility or aliasing. As a result of that, the current E-VPN procedures do not need to be modified. c) Allows a flexible implementation where the prefix can be linked to different types of next-hops: MAC address, IP address, IRB IP address, ESI, etc. and these MAC or IP addresses do not need to reside in the advertising NVE. d) An E-VPN implementation not requiring IP Prefixes can simply discard them by looking at the route type value. 5. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. 6. Security Considerations 7. IANA Considerations 8. References 8.1. Normative References [RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. 8.2. Informative References [E-VPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- l2vpn-evpn-03.txt, work in progress, February, 2013 [E-VPN-OVERLAYS] Sajassi-Drake et al., "A Network Virtualization Overlay Solution using E-VPN", draft-sd-l2vpn-evpn-overlay-01.txt, work in progress, February, 2013 [BGP-REMOTE-NH] Van de Velde et al., "BGP Remote-Next-Hop", draft-vandevelde-idr-remote-next-hop-03.txt, work in progress, October, 2012 9. Acknowledgments Rabadan et al. Expires January 16, 2014 [Page 20] Internet-Draft E-VPN Prefix Advertisement July 15, 2013 The authors would like to thank Mukul Katiyar and Senthil Sathappan for their valuable feedback and contributions. 10. Authors' Addresses Jorge Rabadan Alcatel-Lucent 777 E. Middlefield Road Mountain View, CA 94043 USA Email: jorge.rabadan@alcatel-lucent.com Wim Henderickx Alcatel-Lucent Email: wim.henderickx@alcatel-lucent.com Florin Balus Nuage Networks Email: florin@nuagenetworks.net Aldrin Isaac Bloomberg Email: aisaac71@bloomberg.net Senad Palislamovic Alcatel-Lucent Email: senad.palislamovic@alcatel-lucent.com Rabadan et al. Expires January 16, 2014 [Page 21]