Internet DRAFT - draft-wang-bess-evpn-cmac-overload-reduction

draft-wang-bess-evpn-cmac-overload-reduction







BESS WG                                                          Y. Wang
Internet-Draft                                                   R. Chen
Intended status: Standards Track                         ZTE Corporation
Expires: 3 March 2022                                     30 August 2021


                          Light Weighted EVPN
            draft-wang-bess-evpn-cmac-overload-reduction-07

Abstract

   SRv6 EVPN [I-D.ietf-bess-srv6-services] is not sufficient for some
   light-weighted use cases.  When PBB EVPN [RFC7623] over SRv6 is used
   to support these light-weighted EVPN services, it is complicated to
   make use of the SID list to carry a function that is aiming for
   C-MACs.

   In [RFC8986], End.DX2 function is defined, this function can be used
   in EVPN VPLS.  When it is used in EVPN VPLS, the data-plane learning
   defined in End.DT2U function can also be transplanted into End.DX2
   function.  On the basis of such extended End.DX2 function, SRv6 EVPN
   can be improved to meet all the requirements per [RFC7623] and bring
   us some other benefits.  Such SRv6 EVPN is called light-weighted SRv6
   EVPN, and it will be more simpler than PBB EVPN over SRv6.

   It is easy for the light-weighted SRv6 EVPN to carry a SID that is
   aiming for customer ethernet packets, because there will be no other
   ethernet header between the SID list and the customer ethernet
   header.  These SIDs may be user-defined functions for the customer
   ethernet headers.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 3 March 2022.




Wang & Chen               Expires 3 March 2022                  [Page 1]

Internet-Draft                  EVPN-lite                    August 2021


Copyright Notice

   Copyright (c) 2021 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Background  . . . . . . . . . . . . . . . . . . . . . . .   3
     1.2.  Overview  . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.3.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   5
     2.1.  No C-MAC Awareness in the Backbone  . . . . . . . . . . .   6
     2.2.  Flexible Multi-homing Remains Supported . . . . . . . . .   6
     2.3.  C-MAC Address Learning and Confinement  . . . . . . . . .   6
     2.4.  No C-MAC Flushing for All-Active ESes . . . . . . . . . .   6
     2.5.  Independent C-MAC Flushing for Single-Active ESes . . . .   7
     2.6.  Independent Convergency per <ESI, EVI>  . . . . . . . . .   7
     2.7.  ESI Route Aggregation in Backbone . . . . . . . . . . . .   7
     2.8.  Unequal load-balance  . . . . . . . . . . . . . . . . . .   7
     2.9.  AC-aware Service Interface  . . . . . . . . . . . . . . .   8
     2.10. Ingress Filtering for Unicast Flows of E-Tree Services  .   8
     2.11. AC-Influenced DF Election . . . . . . . . . . . . . . . .   8
     2.12. Synchronous MAC Entries in All-Active Mode  . . . . . . .   8
   3.  Light-Weighted SRv6 EVPN Overview . . . . . . . . . . . . . .   8
     3.1.  Use Case  . . . . . . . . . . . . . . . . . . . . . . . .   8
     3.2.  Solution Overview . . . . . . . . . . . . . . . . . . . .   9
       3.2.1.  Aggregatable End.DX2 SID  . . . . . . . . . . . . . .  10
       3.2.2.  The Advertisement of ESI-Prefixes . . . . . . . . . .  10
     3.3.  Packet Walkthrough  . . . . . . . . . . . . . . . . . . .  11
       3.3.1.  PE1 forward ARP Request to PE2/PE3  . . . . . . . . .  11
       3.3.2.  PE2/PE3's Dataplane MAC Learning  . . . . . . . . . .  12
       3.3.3.  PE2 Discard ARP Request to H1 . . . . . . . . . . . .  12
       3.3.4.  PE3 Forward ARP Replay to PE1/PE2 . . . . . . . . . .  13
       3.3.5.  PE1 Forward ARP Replay to H1  . . . . . . . . . . . .  13
   4.  Decapsulation Optimizations . . . . . . . . . . . . . . . . .  13
     4.1.  Decapsulation Aggregation . . . . . . . . . . . . . . . .  13
     4.2.  End.DX2AGG Function and Arg.ACI . . . . . . . . . . . . .  14
   5.  Advanced Considerations . . . . . . . . . . . . . . . . . . .  15



Wang & Chen               Expires 3 March 2022                  [Page 2]

Internet-Draft                  EVPN-lite                    August 2021


     5.1.  ESI SID Aggregation . . . . . . . . . . . . . . . . . . .  15
     5.2.  ESI/AC SID Advertisement Optimization . . . . . . . . . .  16
       5.2.1.  Advertise ESI-Locators in Underlay Network  . . . . .  16
       5.2.2.  Using EAD/EVI Route to Advertise AC SIDs  . . . . . .  16
       5.2.3.  Using EAD/ES Route to Advertise ESI SIDs  . . . . . .  16
       5.2.4.  The Reduction of EAD/EVI Routes . . . . . . . . . . .  17
         5.2.4.1.  AC-DF per EVI Mode for Light-Weighted EVPNs . . .  17
         5.2.4.2.  On Receiving Reverse EAD/EVI Routes . . . . . . .  18
     5.3.  Unequal LB Advertisement  . . . . . . . . . . . . . . . .  18
     5.4.  AC-aware Bundling Service Interface . . . . . . . . . . .  19
     5.5.  C-MAC Flush Notification Procedure  . . . . . . . . . . .  19
     5.6.  E-Tree Support Considerations . . . . . . . . . . . . . .  19
     5.7.  MAC-Synchronization in All-Active Mode  . . . . . . . . .  20
     5.8.  EVPN IRB Support Considerations . . . . . . . . . . . . .  21
       5.8.1.  EVPN IRB Extended Mobility  . . . . . . . . . . . . .  21
       5.8.2.  Anycast IRB interfaces  . . . . . . . . . . . . . . .  21
         5.8.2.1.  Constructing GW-list  . . . . . . . . . . . . . .  21
         5.8.2.2.  Flooding over GW-list . . . . . . . . . . . . . .  21
   6.  Comparison with Other Solutions . . . . . . . . . . . . . . .  22
     6.1.  Detailed Comparisons with PBB EVPN over SRv6  . . . . . .  22
     6.2.  Detailed Comparisons with Anycast Node SID  . . . . . . .  23
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  23
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  23
     8.1.  End.DX2AGG SID  . . . . . . . . . . . . . . . . . . . . .  23
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  23
   10. Normative References  . . . . . . . . . . . . . . . . . . . .  23
   11. Informative References  . . . . . . . . . . . . . . . . . . .  25
   Appendix A.  Explanation for Physical Links of the Use-cases  . .  26
     A.1.  Failure Detections for P1.2 (or P2.1) . . . . . . . . . .  27
     A.2.  Protection Approaches for N1 (or N2)  . . . . . . . . . .  27
       A.2.1.  CCC-Approaches  . . . . . . . . . . . . . . . . . . .  28
         A.2.1.1.  CCC Active-Active Protection  . . . . . . . . . .  28
         A.2.1.2.  CCC Active-Standby Protection . . . . . . . . . .  28
       A.2.2.  VSI-Approaches  . . . . . . . . . . . . . . . . . . .  28
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  28

1.  Introduction

1.1.  Background

   When there are too many customer-MACs (C-MACs), the RRs and/or ASBRs
   will be overloaded by the RT-2 routes for these MACs according to
   [RFC7432].  This issue can be solved by light-weighted EVPNs.  PBB
   EVPN [RFC7623] is a MPLS-based light-weighted EVPN solution.  But in
   SRv6 network, PBB EVPN over SRv6 is not a good choice for light-
   weighted EVPN solution.





Wang & Chen               Expires 3 March 2022                  [Page 3]

Internet-Draft                  EVPN-lite                    August 2021


   This document proposes some new extensions to
   [I-D.ietf-bess-srv6-services] to achieve all-active mode ES
   redundancy on TPEs and reduce the C-MAC loads for RRs and ASBRs at
   the same time.  The new solution will work even more better than PBB
   EVPN under the help of these extensions, especially when there is no
   deployment of MPLS dataplane.

   Furthermore, it naturally brings the benefits of high scalability,
   faster network convergence, and reduced operational complexity, and
   we call it light-weighted EVPNs because of these advantages.

1.2.  Overview

   In [RFC7432], the C-MACs is advertised via RT-2 route.  This behavior
   is inheritted by [I-D.ietf-bess-srv6-services].  but in order to
   solve the C-MAC overload problem for RRs and ASBRs, we have to return
   to a PBB-like dataplane C-MAC learning procedures.

   We discuss all the requirements for a light-weighted EVPN solution
   which pushes no C-MAC entries into the backbone network in Section 2.
   Note that some of these requirements is not supported well by PBB
   EVPN.

   In this document, the light-weighted EVPN solutions are also called
   as EVPN-lite for short.

   Note that the EVI here corresponds to the I-Component of [RFC7623],
   not the B-Component.  In fact, there will be no typical B-components
   in EVPN-lite SRv6 solutions.

1.3.  Terminology

   Most of the terminology used in this documents comes from [RFC7432]
   and [I-D.ietf-bess-srv6-services] except for the following:

   *  Light-weighted EVPN: The EVPN solution with high scalability and
      reduced operational complexity.

   *  EVPN-lite: The Light-weighted EVPN is also called EVPN-lite for
      short.

   *  C-MAC: Customer MAC, it is the same as the C-MAC of PBB EVPN.

   *  ISID: a broadcast domain identifier in PBB I-Component.

   *  LDV: Local Discreminating Value.  It is similar to the Local
      Discreminating Value of type 3 ESI.




Wang & Chen               Expires 3 March 2022                  [Page 4]

Internet-Draft                  EVPN-lite                    August 2021


   *  GDV: Global Discreminating Value.  An identifier with global
      uniqueness.

   *  EGD: EVI-GDV, an EVI's Global Discreminator, it is a GDV for an
      EVPN MAC-VRF.  A EGD is used to idenfify an EVPN MAC-VRF in data
      plane.  The EGD is a Global Discreminating Value (GDV) of that
      EVPN MAC-VRF, so it is also the abbreviation of EVPN-GDV.  e.g.
      The EGD of [RFC7348] is a global VNI.  In this draft, the EGD of
      an EVI is that MAC-VRF's VPN ID of global uniqueness.

   *  AC SID: The End.DX2 SID of a specified AC, different ACs on the
      same ES have different AC SIDs.

   *  ESI SID: An SRv6 SID whose function type is End.DX2AGG.  Note that
      when the ESI is all-active mode, the ESI SID is the same on all
      PEs of that ES, according to Section 3.2.  In such case, the ESI
      SID can be called as ES anycast SID too.  Different ACs on the
      same ES have the same ESI SID with different Arg.ACI.

   *  ESI IP: The End.DX2AGG SID of a specified ESI, but with an empty
      Arg.ACI.

   *  ESI Prefix: The IPv6 Prefix that covers all AC-SIDs of the
      specified ESI.

   *  Ingress AC SID: The End.DX2 SID of the ingress-AC on ingress-PE.
      Note that the Ingress AC SID are typically encapsulated as SRv6
      Source IP in data plane.

   *  EAD/ES: Ethernet A-D route per EVI, or RT-1 per ES route.

   *  EAD/EVI: Ethernet A-D route per EVI, or RT-1 per ES route.

   *  Arg.ACI: The argument part of a SID of the End.DX2AGG function is
      called as Arg.ACI, because the value of that argument will be an
      AC-ID.

   *  RT-2: MAC/IP Advertise Route.

   *  MAC Entry: An entry in the EVPN MAC table in data-plane.

   *  GRT: Global Routing Table.

2.  Requirements

   Light-weighted SRv6 EVPNs should be provided together with the
   following requirements:




Wang & Chen               Expires 3 March 2022                  [Page 5]

Internet-Draft                  EVPN-lite                    August 2021


2.1.  No C-MAC Awareness in the Backbone

   In typical operation, an EVPN PE sends a BGP MAC Advertisement route
   per C-MAC address.  In certain applications, this poses scalability
   challenges, as is the case in data center interconnect (DCI)
   scenarios where the number of virtual machines (VMs), and hence the
   number of C-MAC addresses, can be in the millions.  This is called as
   C-MAC overload of DC Backbone.  In such scenarios, it is required to
   reduce the number of BGP MAC Advertisement routes by relying on a
   'EVPN-lite' scheme, as is provided by ESI and its equivalents (e.g.
   Pseudo B-MAC, ESI IP).

2.2.  Flexible Multi-homing Remains Supported

   Flexible multi-homing means that different ES instances can have
   different adjacent-PEs.  We call all the adjacent-PEs of the same ES
   instances as that ES's location-set in this document.  Flexible
   multi-homing means that different ES can have different location-set.

   For example, ES101's location-set is {PE1}, ES102's location-set is
   {PE2, PE3}, ES103's location-set is {PE1, PE3}, and ES104's location-
   set is {PE2,PE4}.

2.3.  C-MAC Address Learning and Confinement

   In EVPN, all the PE nodes participating in the same EVPN instance are
   exposed to all the C-MAC addresses learnt by any one of these PE
   nodes because a C-MAC learnt by one of the PE nodes is advertised in
   BGP to other PE nodes in that EVPN instance.  This is the case even
   if some of the PE nodes for that EVPN instance are not involved in
   forwarding traffic to, or from, these C-MAC addresses.  Even if an
   implementation does not install hardware forwarding entries for C-MAC
   addresses that are not part of active traffic flows on that PE, the
   device memory is still consumed by keeping record of the C-MAC
   addresses in the routing information base (RIB) table.  In network
   applications with millions of C-MAC addresses, this introduces a non-
   trivial waste of PE resources.  As such, it is required to confine
   the scope of visibility of C-MAC addresses to only those PE nodes
   that are actively involved in forwarding traffic to, or from, these
   addresses.

2.4.  No C-MAC Flushing for All-Active ESes

   Just as in [RFC7623], it is required to avoid C-MAC address flushing
   upon link, port, or node failure for remote All-Active multihomed
   segments.





Wang & Chen               Expires 3 March 2022                  [Page 6]

Internet-Draft                  EVPN-lite                    August 2021


   Note that when an ES fails on one PE, it may still works well on
   another PE, so the C-MACs should not be flushed.

2.5.  Independent C-MAC Flushing for Single-Active ESes

   Just as in [RFC7623], upon single-active ESI's link or port failure,
   the C-MACs of other single-active ESes from the same PE will not be
   flushed.

2.6.  Independent Convergency per <ESI, EVI>

   When the physical port of an All-Active ES works well, but a single
   Ethernet Tag ID (ETI) of that ES fails (illustrated as the 'X' flag
   in Figure 1) on PE1, The traffic to that ETI of that ES will be re-
   routed to other adjacent PE of the same ES, but the traffic to other
   ETIs of the same ES will not be affected.

   If PE1 is the last active link for that <ESI, ETI> before that
   failure, C-MAC flush should be triggered on the remote PEs.  If that
   ESI is single-active, C-MAC flush should be triggered on the remote
   PEs too.

   Note that when AC (ES link) fails but PE node still works well, there
   should not be steady bypassing traffic either.  The steady bypassing
   problem is discussed in [I-D.wang-bess-evpn-egress-protection].

2.7.  ESI Route Aggregation in Backbone

   In SRv6 EVPN, different sub-interfaces of the same ESI can have
   different AC-SIDs in order to achieve Independent Convergency per
   <ESI, EVI>.  But only the common prefix (say ESI-Prefix) of them
   should be advertised in underlay network.

   Note that only the common prefix need to be advertised in the overlay
   network before any of these sub-interfaces failed.

   Note that different ESIs may use the same SRv6 locator.  In such
   case, these ESI SIDs are aggregated into that anycast SRv6 locator
   while they are advertised in the underlay network.

2.8.  Unequal load-balance

   The light-weighted EVPNs should support the unequal load-balance
   defined in [I-D.ietf-bess-evpn-unequal-lb].







Wang & Chen               Expires 3 March 2022                  [Page 7]

Internet-Draft                  EVPN-lite                    August 2021


2.9.  AC-aware Service Interface

   In AC-aware bundling service interface
   [I-D.sajassi-bess-evpn-ac-aware-bundling], the ESes may make its two
   VLANs to be attached to the same broadcast domain.  These two VLANs
   may be assigned to the same sub-interface, or to different sub-
   interfaces.

2.10.  Ingress Filtering for Unicast Flows of E-Tree Services

   The filtering needed by an E-Tree service for known unicast traffic
   should be performed at the ingress PE, thus providing very efficient
   filtering and avoiding sending known unicast traffic over the PSN to
   be filtered at the egress PE, as is done in traditional E-Tree
   solutions (i.e., E-Tree for VPLS [RFC7796]).

2.11.  AC-Influenced DF Election

   When the EAD/EVI route is not advertised before the corresponding ESI
   sub-interface fails, The AC-influenced DF Election procedures should
   elect the right DF before and after that failure.

   Note that according to [RFC8584], the AC-influenced DF Election will
   be incorrect when no EAD/EVI route is advertised, even if no ESI sub-
   interface has failed at all.

   The AC-influenced DF Election should support "service carving" like
   what [RFC7432] section 8.5 have done.

2.12.  Synchronous MAC Entries in All-Active Mode

   When a C-MAC Mx is learnt on an attachment circuit AC1 of an all-
   active Ethernet Segment ES21 on PE1, Mx should not be in unknown
   unicast state on PE2, which is also adjacent with ES21.  And the
   outgoing interface of PE2's MAC entry of Mx should be AC2, which is
   an AC of ES21 and has the same VLAN as AC1.

3.  Light-Weighted SRv6 EVPN Overview

3.1.  Use Case

   The physical links of these use cases are described in Appendix A.
   Here we discribe the ACs and broadcast domains.  Note that the VN-10/
   VN-20/VUN1 in Figure 1 is the VPNx/VPNy/NIz in Figure 5.

   The ethernet segment ES21's ESI is ESI21, the ES21 is attached to
   MAC-VRF VN-10 via attachment circuit AC1 on PE1, the ES21 is attached
   to MAC-VRF VN-10 via attachment circuit AC2 on PE2, We assign an



Wang & Chen               Expires 3 March 2022                  [Page 8]

Internet-Draft                  EVPN-lite                    August 2021


   End.DX2 SID DX2_AC1 to AC1, and we assign an End.DX2 SID DX2_AC2 to
   AC2.  The ethernet segment ES3's ESI is ESI3, the ES3 is attached to
   MAC-VRF VN-10 via attachment circuit AC3 on PE3, We assign an End.DX2
   SID DX2_AC3 to AC3.

   Note that network instance VUN1 is the (virtual) underlay network of
   VN-10 and VN-20.  Because that VN-10 and VN-20 are SRv6 EVPN MAC-
   VRFs, their underlay network will be the SRv6 network of the GRT.

                                  +------------------------+
                            PE1   |                        |
                          +-------+-------+ ------------>  |
                    V1P1  |  ___(VN-10)   | VUN1: DX2AGG:: |
         EVC1(V1)   V2P1  | /DX2_AC1  \   | through IGP or |
     SN1--O=------=========<       (VUN1) | RT-1 per ES    |  PE3
            \    /  ESI21 | \___      /   |           +----+-----+
             |  |     +   | X   (VN-20)   |           |          |
             X  X     |   +----------+----+           |          |
             |P3|     |              |                |   (VN-10)--+H3
             |  |     |              | | RT2(VN:C-MAC)|    /     |
          V1 |  | V2  | DX2AGG::/96  | | EVI-RT       | (VUN1)   |
             |  |     |              | V ES-Import RT |    \     |
             |  |     |              |                |   (VN-20)--+H4
             |  |     |   +----------+----+           |          |
              \/      +   | X___(VN-10)   |           |          |
              /\    ESI21 | /DX2_AC2  \   |           +----+-----+
     SN2--O===--===========<       (VUN1) |                |
         EVC2(V2)   V1P2  | \___      /   |                |
                    V2P2  |     (VN-20)   |                |
                          +-------+-------+                |
                            PE2   |                        |
                                  +------------------------+

                      Figure 1: EVPN-lite SRv6 Usecase

   We use IMET routes to build a broadcast-list.  The broadcast-list is
   used to forward BUM traffics.  The data-plane MAC learning for BUM
   traffics produces the first batch of C-MAC entries.  The subsequent
   C-MAC entries can be learnt from Unicast traffics and/or BUM
   traffics.  It is clear that we don't use MAC/IP routes to advertise
   C-MAC entries as usual, that is for fear that the RRs and/or SPEs are
   overloaded by these C-MACs.

3.2.  Solution Overview







Wang & Chen               Expires 3 March 2022                  [Page 9]

Internet-Draft                  EVPN-lite                    August 2021


3.2.1.  Aggregatable End.DX2 SID

   When an Ethernet Segment ES21 is attached to an EVI, the attachment-
   circuit AC1 for that <ESI,EVI> is assigned with an End.DX2 SID.
   Different ACs of the same ESI are assigned with different End.DX2
   SIDs, we call them AC SIDs in this document.  But these different
   End.DX2 SIDs must be able to be aggregated into the same prefix, and
   this prefix are called as ESI prefix in light-weighted SRv6 EVPNs.
   The format of aggregatable End.DX2 SIDs is illustrated in the
   following figure:

       |<---   ESI-Prefix(128-N bits)   ---->|<----     N bits     --->|
       +------------+------------+-----------+-------------------------+
       |    Block   |   Node     | ESI.LDV   |          AC-ID          |
       +------------+------------+-----------+-------------------------+
       |<------ Locator -------->|<------------- Function ------------>|

               Figure 2: End.DX2 SID Formart for Aggregation

   Note that the ESI.LDV field is the Local Discreminator Value (LDV) of
   the ESI (especially the type 3/4/5 ESI).  The AC-ID field is the
   identifier of the AC's EVI.  The ESI.LDV field and the AC-ID field
   are integrated into the End.DX2 SID's Function part.

   Note that in "AC-aware bundling service interface" the AC-ID field
   MUST be the same as the Attachment Circuit ID of
   [I-D.sajassi-bess-evpn-ac-aware-bundling].  But in other service
   interfaces the AC-ID field can also be the EGD of that AC's MAC-VRF.
   Note that the EGD has a global meaning like a global VNI or a PBB
   I-SID, while the ordinary AC-ID part for an aggregatable End.DX2 SID
   typically is only a VLAN-ID on that ES.

   Note that the ESI IP of an AC is that AC's End.DX2 SID but with a
   zero AC-ID.  The AC SIDs have non-zero AC-IDs, but the ESI-IPs always
   have zero AC-IDs.  Becuase an ESI-IP identifies an ESI, not an AC.

   Note that if ESI21 is single-active mode, DX2_AC1 is different from
   DX2_AC2, but if ESI21 is all-active mode, DX2_AC1 is the same as
   DX2_AC2, we can call them DX2_SID21 in such case.

3.2.2.  The Advertisement of ESI-Prefixes

   The ESI-prefixes of DX2_AC1 and DX2_AC2 are defined in Figure 2, and
   they are called ESI_Prefix1 and ESI_Prefix2 respectively.  We can use
   IGP protocols to advertise these ESI-Prefixes to PE3 respectively in
   SRv6 underlay.  So we don't have to use EAD/ES route or EAD/EVI route
   in SRv6 EVPN in this section.




Wang & Chen               Expires 3 March 2022                 [Page 10]

Internet-Draft                  EVPN-lite                    August 2021


   Note that the SRv6 SID in IMET route is an End.DT2M SID but with a
   zero argument length.

   Note that if ESI21 is single-active mode, ESI_Prefix1 is different
   from ESI_Prefix2, but if ESI21 is all-active mode, ESI_Prefix1 is the
   same as ESI_Prefix2.

   Note that when PE1 node fails and the ESI is all active, the PLR node
   will do underlay anycast FRR switching for
   DX2_SID21(=DX2_AC2=DX2_AC1).  This will bring out fast network
   convergency.

   Note that when the PE-CE link of ESI21 fails, the IGP route of
   ESI_Prefix1 will be withdrawn, So there will be no steady bypassing
   for that ES, but a temporary bypassing can be performed to further
   improve the convergency.

   When two ESes are attached to the same redundancy group of PEs, they
   can share the same anycast SRv6 Locator.  In such case, only the
   common SRv6 Locator is advertised by the underlay network.  But they
   should have different ESI-Prefix.  Because that the ESI-SID
   Aggregation is not recommanded to be activated in order to avoid the
   steady bypass problems described in Section 5.1.

   The detailed comparisons between light-weighted SRv6 EVPN and PBB
   EVPN over SRv6 is described in Section 6.

3.3.  Packet Walkthrough

3.3.1.  PE1 forward ARP Request to PE2/PE3

   *  When H1 (of SN1) requests H3's ARP, PE1 will receive the ARP
      Request BUM1 from AC1 of ESI21.  PE1 will forward the ARP Request
      following the broadcast-list of AC1's MAC-VRF VN-10.  The
      broadcast-list is constructed by the IMET routes from PE2 and PE3.
      The End.DX2 SID of AC1 is named as DX2_AC1.

      PE1 will forward the ARP Request to PE2 and PE3.  The inner SMAC
      of the ARP request is M1 which is H1's MAC address.

   *  In this step, PE1 will forward the ARP Request BUM1 to PE2/PE3
      with the following SRv6 encapsulation: It's underlay Source IP is
      the End.DX2 SID (DX2_AC1) on PE1 for the ingress AC; It's underlay
      Destination IP is the End.DT2M SID (whose argument length is zero)
      on PE2/PE3.






Wang & Chen               Expires 3 March 2022                 [Page 11]

Internet-Draft                  EVPN-lite                    August 2021


      Note that the underlay SIP will be the End.DT2U SID (because they
      don't need any dedicated End.DX2 SIDs) for the single-homed
      ingress ACs.  The multi-homed ingress ACs with single-active
      behavior may not be assigned with a dedicated ESI-Prefix either.
      In such situations, the underlay SIP can be the End.DT2U SID too.
      Note that in such situations, the AC SIDs of all single-active
      ESIs for the same EVI are aggregated into the same End.DT2U SID.

3.3.2.  PE2/PE3's Dataplane MAC Learning

   *  When PE2/PE3 receives the ARP Request packet BUM1, they do
      dataplane MAC learning independently.  They will learn that M1 is
      behind DX2_AC1.

      Note that when PE2 learns that M1 is behind DX2_AC1, it will
      assume that M1 is behind the local AC (AC2) whose End.DX2 SID
      (DX2_AC2) is the same as DX2_AC1 too.  The local AC may have more
      higher priority than the remote one.

      After the dataplane MAC learning, the ARP request packet BUM1 is
      broadcasted to the local ACs, behind one of which is H3.

3.3.3.  PE2 Discard ARP Request to H1

   *  On receiving BUM1 from PE1, PE2 use the ingress ESI information
      (DX2_AC1) in BUM1 to determine its ingress ESI-Prefix, When ESI21
      is all-active mode and PE2 is about to forward the ARP request to
      H1, PE2 will find that the AC SID (DX2_AC2) for the outgoing AC
      (AC2) is of the same ESI-Prefix, so PE2 discards it for ESI loop-
      free considerations.

      Note that before that ARP Request packet is discarded, its source-
      MAC can be learnt, especially in "AC-aware bundling service
      interface".  The MAC entry is learnt against DX2_AC1, but it will
      consider the local sub-interface (of the same AC SID) on that ES
      as its outgoing interface, in order to avoid unknown-unicast
      flooding.

      When ESI21 is single-active mode, the outgoing AC may be in
      blocking state, otherwise its corresponding sub-interface on H1
      will take charge of packet-drop behavior instead.  So although the
      AC-SID (DX2_AC2) for the outgoing AC is not the same as DX2_AC1,
      no loop will arise in the Ethernet Segment.

   *  In this step, PE2 can compare the ingress AC-SID of BUM1 and the
      AC-SID of outgoing AC directly, no SID-to-ESI lookup needed.





Wang & Chen               Expires 3 March 2022                 [Page 12]

Internet-Draft                  EVPN-lite                    August 2021


3.3.4.  PE3 Forward ARP Replay to PE1/PE2

   *  When H3 replies to H1 for the ARP request BUM1, PE3 will forward
      the ARP reply U1 according to the MAC entry M1 learnt previously
      as above.

      PE3 will forward the ARP reply U1 to PE1 or PE2 according to
      DX2_AC1's SRv6 locator's IGP route.

      When ESI21 is all-active mode, DX2_AC1 will be the same as
      DX2_AC2, in such case, we call both of them DX2_SID21 instead.
      The traffics to M1 will be load-balanced between PE1 and PE2.
      Because that DX2_SID21's locator is advertised by both PE1 and PE2
      in the underlay IGP protocol.

   *  In this step, PE3 will forward the ARP reply U1 to PE1 with the
      following SRv6 encapsulation: It's underlay Source IP is the
      End.DX2 SID on PE3 for AC3; It's underlay Destination IP is the
      End.DX2 SID (DX2_AC1) on PE1 for AC1 according to the MAC entry
      M1.

      Note that if the DIP is just the anycast node SID of PE1 and PE2,
      when the PE-CE link of ESI21 fails, the traffic will be steadily
      bypassed untill that link recovers again.  That's why MAC-entries
      should be learnt against AC-SIDs.

3.3.5.  PE1 Forward ARP Replay to H1

   *  When PE1 receives the ARP reply packet U1 from PE3, PE1 first
      match the packet to its MAC-VRF VN-10 by U1's destination End.DX2
      SID.  And PE1 will not discard it because the egress AC's AC-SID
      is not the same as the ingress AC-SID (which is represented by
      U1's source IP).

   *  In this step, When PE1 receives the SRv6 encapsulated ARP reply
      packet U1 from PE3, PE1 first match the packet to the End.DX2 SID
      of AC1 by DIP, then match the packet to AC1's MAC-VRF VN-10.

4.  Decapsulation Optimizations

4.1.  Decapsulation Aggregation

   We want to decapsulation the packets destining to different ESIs for
   the same EVI using the same forwarding entry.  In order to achieve
   this benefit, we can use an AC's EVI's EGD as that AC's AC SID's AC-
   ID.





Wang & Chen               Expires 3 March 2022                 [Page 13]

Internet-Draft                  EVPN-lite                    August 2021


   These AC SIDs are aggregatable End.DX2 SIDs, so we can consider the
   ESI prefix aggregated from these End.DX2 SIDs as a new SRv6 function
   called End.DX2AGG SID, The format of the End.DX2AGG SID is
   illustrated in the following figure:

       |<------ Locator -------->|<- FUNC -->|<------ Arg.ACI -------->|
       +------------+------------+-----------+-----------------------+-+
       |    Block   |   Node     | ESI.LDV   |        EGD            |L|
       +------------+------------+-----------+-----------------------+-+

                      Figure 3: End.DX2AGG SID Format

   Note that whether these SIDs are considered as lots of End.DX2 SIDs
   or are considered as a single End.DX2AGG SID with different
   arguments, it is just a local matter of their PE node's independent
   choice, other PEs of the same EVI won't be aware of the difference of
   these two implementations.

   A SID with the End.DX2AGG function is called as an "ESI SID" in this
   document.  The ESI's ESI-Prefix is the locator and fuction part of
   its corresponding ESI SID.  The argument part of the ESI SID is the
   AC-ID for the corresponding AC's End.DX2 SID.  The AC-ID plus the
   ESI.LDV works like the function part of an End.DX2 SID.  The argument
   part of an ESI SID is called as Arg.ACI in this document.

   Note that the Arg.ACI comprises EGD (EVPN Global Discreminator) and L
   bit.  The EGD identifies the EVI of that AC.  When that AC is a leaf
   AC, the L bit is 1, otherwise the L bit is 0.

   Note that when AC-ID is the EGD, PE2 can still decapsulate the packet
   following the End.DX2 function or following the End.DX2AGG function.
   It is just a local matter, while the End.DX2AGG function can reduce
   the decapsulation forwarding entries.  But when AC-ID is that AC's
   VLAN-IDs, PE2 have to decapsulate the packet following the End.DX2
   function.

4.2.  End.DX2AGG Function and Arg.ACI

   The "Endpoint with decapsulation and Aggregated L2 table forwarding"
   behavior (End.DX2AGG for short) is a variant of the End.DX2 behavior.

   Two of the applications of the End.DX2AGG behavior are the EVPN VPLS
   [RFC7432] and the EVPN ETREE [RFC8317] use-cases.

   Any SID instance of this behavior is associated with an ESI E.  The
   behavior also takes an argument: "Arg.ACI".  This argument provides a
   local mapping to an EVI V.  The outgoing interface corresponds to
   <ESI E, EVI V> is OIF, and the EVI V's bridge table is L2 Table T .



Wang & Chen               Expires 3 March 2022                 [Page 14]

Internet-Draft                  EVPN-lite                    August 2021


   The End.DX2AGG SID MUST be the last segment in a SR Policy.

   When N receives a packet whose IPv6 DA is S and S is a local
   End.DX2AGG SID, the processing is identical to the End.DT2U behavior
   except for the Upper-layer header processing which is as follows:

    S01. If (Upper-Layer Header type == 143(Ethernet) ) {
    S02.    Remove the outer IPv6 Header with all its extension headers.
    S03.    Determine the L2 Table T using Arg.ACI.
    S04.    Learn the exposed MAC Source Address in L2 Table T.
    S05.    Find out the OIF, Forward the Ethernet frame to the OIF.
    S06. } Else {
    S07.    Process as per Section 4.1.1 of [RFC8986].
    S08. }

   Note that the OIF can be found out using the MAC-entries in L2
   Table T, when the EVI V is an E-LAN service.

5.  Advanced Considerations

5.1.  ESI SID Aggregation

   There are obvious difference between "Route Aggregation" and "SID
   Aggregation" for an ESI.  The "ESI Route Aggregation" is that
   different End.DX2AGG SIDs are advertised by underlay protocols in a
   common SRv6 locator, but different ESIs still have different
   End.DX2AGG SIDs.  The "ESI SID Aggregation" is that different ESIs
   use the same SRv6 SID.

   Note that the "ESI Route Aggregation" is recommanded as long as it is
   possible, but the "ESI SID Aggregation" can only be used under
   certain restraints.

   When two ESes are attached to the same redundancy group of PEs, they
   can share the same SRv6 SID.  But this will bring out some issues
   too.  One of these issues is that they may be attached to different
   groups of PEs in the future.  Another issue is that when only one of
   the ESes fails, that common SRv6 SID can't be withdrawn by that PE,
   so the steady bypass of that ES arises immediately after its failture
   on that PE.  If these issues are not so important in some scenarios,
   The ESI-SID Aggregation may be activated.  This is an option.

   Note that when ESI SID Aggregation is activated, the local-bias ES
   split-horizon procedures or its variations should be used.

   Note that ESI SID Aggregation works well with single-active ESIs (see
   Section 3.3), its steadby bypassing problem will arise with all-
   active ESIs only.



Wang & Chen               Expires 3 March 2022                 [Page 15]

Internet-Draft                  EVPN-lite                    August 2021


   Note that the sub-interfaces of the same ESI may be assigned with
   different End.DX2 SIDs, and these End.DX2 SIDs can be aggregated into
   a common prefix, this common prefix is assigned with that ESI.  In
   such case, only the common prefix should be advertised before any of
   the sub-interfaces fails.  But this is not considered as "ESI SID
   Aggregation", this is "ESI Route Aggregation".

5.2.  ESI/AC SID Advertisement Optimization

5.2.1.  Advertise ESI-Locators in Underlay Network

   The End.DX2AGG SIDs can be advertised as an IP prefix in underlay IGP
   protocols.  Although it is the aggregation of many AC SIDs, the ESI
   SIDs may still be too many for the underlay network.  And the core
   routers who are service-agnostic have to install these ESI prefixes.

   In order to solve these problems, only the anycast SRv6 locators (say
   ESI-Locators) of such ESI prefixes should be advertised in the
   underlay network.

   Note that in such case the ESI/AC SID typically don't have to be
   advertised by EVPN routes in overlay network, unless some special
   features (i.e. unequal load-balance) should be providered together.

5.2.2.  Using EAD/EVI Route to Advertise AC SIDs

   When the EAD/EVI routes here are used to advertise AC SIDs, the
   End.DX2 SIDs are advertised in their SRv6 L2 Service TLVs, not in
   their next hops.  Their next hops will be the node SID of the
   advertising PE.

   In such case, the EAD/EVI routes will be installed as overlay routes,
   and the AC SIDs learnt in the MAC entries is treated as the overlay
   indexes for recursion.

   In all-active mode, when an AC of a <ESI, EVI> fails on one PE, all
   other PEs of that <ESI, EVI> should use EAD/EVI route to advertise
   its AC SID.

5.2.3.  Using EAD/ES Route to Advertise ESI SIDs

   In section 6.1.1 of [I-D.ietf-bess-srv6-services], the SRv6 L2
   Service TLVs of EAD ES routes just carry the Arg.FE2 infomations.
   Here the SRv6 L2 Service TLVs of EAD ES routes carry the ESI SIDs.







Wang & Chen               Expires 3 March 2022                 [Page 16]

Internet-Draft                  EVPN-lite                    August 2021


   EAD/ES routes will be advertised/imported for EVIs but they should be
   installed into Global Routing Table (GRT).  Because there isn't a
   dedicated B-component in EVPN-lite SRv6 like that in PBB VPLS and PBB
   EVPN.  The GRT plays a B-Component role in EVPN-lite SRv6.

   Note that the EAD/ES routes won't be installed as overlay routes like
   the EAD/EVI routes, because that we want to reduce the forwarding
   table consumption.

   Although ESI SIDs are installed into GRT, they are awared only on PE
   nodes, the transit nodes in underlay network won't be aware of ESI
   SIDs (they may aware the locators of these SIDs) in order to reduce
   the FIB consumption.

   Note that when the EAD/ES route here is used to advertise ESI SID,
   the End.DX2AGG SID is advertised in its SRv6 L2 Service TLV, not in
   its nexthop.  Its nexthop will be the node SID of the advertising PE.

   Note that in such case, the SRv6 source IP in the dataplane should be
   set to the entire AC SID of the ingress AC, not just the ESI IP whose
   AC-ID part is zero.

5.2.4.  The Reduction of EAD/EVI Routes

   In order to solve the problem described in Section 2.6, we may have
   to advertise AC SIDs in the overlay network.  But the amount of AC
   SIDs may be hundreds of times larger than ESI SIDs.  It is necessary
   for the light-weighted SRv6 EVPNs to reduce the advertisement of AC
   SIDs.

   The AC SID of a specified <ESI,EVI> will not be advertised by its
   PEs, until these PEs know that the <ESI,EVI> fails on at least one of
   them.

   Note that the entire AC SID for that <ESI,EVI> can be used as the
   source IP of the SRv6 encapsulation before that AC SID is advertised
   via EVPN routes.  Because that when a MAC is learnt over that AC SID,
   the packet for that MAC can also be forwarded according to the ESI-
   Prefix or ESI-Locator of the corresponding ESI SID due to the longest
   match procedures of IP lookup.

5.2.4.1.  AC-DF per EVI Mode for Light-Weighted EVPNs

   When the EAD/EVI routes are not advertised, the AC-influenced DF-
   Election per [RFC8584] can't work.  So the AC-DF per EVI procedures
   are required.  The AC-DF per EVI procedures includes two steps.  The
   first step is the AC-DF per EVI capability negotiation procedure, and
   the second step is the AC-DF per EVI DF-election procedure.



Wang & Chen               Expires 3 March 2022                 [Page 17]

Internet-Draft                  EVPN-lite                    August 2021


   The Capability negotiation procedures and the DF-Election procedures
   follow [I-D.wang-bess-evpn-ac-df-per-evi].

5.2.4.2.  On Receiving Reverse EAD/EVI Routes

   In all-active mode, when a PE X receives a reverse EAD/EVI route
   ([I-D.wang-bess-evpn-ac-df-per-evi]), that PE x can use nomal EAD/EVI
   route to advertise its local AC SID of that <ESI,EVI>.

   Note that no EAD/EVI route have to be advertised before receiving the
   corresponding reverse EAD/EVI routes.  This can greatly reduce the
   amount of EAD/EVI routes.

5.3.  Unequal LB Advertisement

   When the ESI SIDs are advertised by EVPN routes for the overlay
   network according to Section 5.2.2, we can advertise the EVPN Link
   Bandwidth extended community (see [I-D.ietf-bess-evpn-unequal-lb])
   along with the ESI SIDs using EAD/ES routes.

   Note that these extra information (which are advertised along with
   the EVPN routes) are awared by the PEs only.  The underlay network
   don't have to be aware of it.

   Note that when the EVPN Link Bandwidth extended community is
   advertised along with the ESI SID, The nexthop of the EAD/ES route
   should not be set to the anycast ECMP Node SID of the advertising PE
   (egress-PE).  On receiving such EAD/ES route, the ingress PE may push
   this EAD/ES route's nexthop onto the End.DX2AGG/End.DX2 SID when
   constructing the SID stack, if unequal-LB is required.

   Note that the association between an ESI SID and its corresponding
   Node SID is also advertised by EAD/ES routes.  In such case, when the
   ESI SIDs are used as destination IP addresses, they should be hiden
   in the SRH behind the node SID of the corresponding egress PE router.
   This need to be encapsulated under the help of EAD/ES routes of
   overlay network.  So the ESI SIDs must be advertised in overlay
   network in such case.

   Although these ESI SIDs (that are used as destination IP addresses to
   PE X) won't be exposed untill data packets reached the egress PE X,
   the ESI-Locator of them should also be advertised in underlay network
   because that their corresponding AC SIDs will be encapsulated as
   source IPs for some other data packets whose ingress PE is PE X.  and
   these source IPs may be checked by underlay URPF (Unicast Reverse
   Path Forwarding) procedures.





Wang & Chen               Expires 3 March 2022                 [Page 18]

Internet-Draft                  EVPN-lite                    August 2021


5.4.  AC-aware Bundling Service Interface

   In AC-aware bundling service interface, Attachment Circuit ID
   extended community ([I-D.sajassi-bess-evpn-ac-aware-bundling]) or
   ACI-specific SOI extended community
   ([I-D.wang-bess-evpn-ether-tag-id-usage]) should be used in ARP/ND
   synchronization.

   Note that each VLAN of the same AC of the same MAC-VRF will have the
   same End.DX2 SID,

   Note that in "AC-aware bundling service interface", the AC-ID inside
   that DX2_AC1 can help the MAC entry to be installed for the correct
   outgoing interface.  Such MAC entry is called as the synced MAC
   entry.

   Note that the MAC enties which are learnt against a DX2-SID should
   have low preference than which are received over a RT-2 route, when
   they are installed to the MAC table.

5.5.  C-MAC Flush Notification Procedure

   The withdraw of an ESI/AC SID Advertisement (as an overlay route) can
   (if it is the only advertisement of that ESI/AC SID at that time) be
   used as C-MAC (which was learnt against that ESI/AC SID) flush
   notification.

   Note that in single active mode, the ESI-Prefixes of DX2_AC1 and
   DX2_AC2 are different, so each withdraw of DX2_AC1 or DX2_AC2 will be
   for the single advertisement of that SID.

   When "AC-DF per EVI" (Section 5.2.4.1) is used, the reverse EAD/EVI
   routes can be used to trigger C-MAC flush for specified AC SIDs.  In
   such case, these reverse EAD/EVI routes should not use EVI-RT format
   to carry their EVI's route-target.  Because that EVI-RT format is not
   visible to RT constraints mechanism.

5.6.  E-Tree Support Considerations

   E-tree Supprot extensions is similar to [RFC8317] section 5 except
   for the following notable differences: The leaf B-MACs are replaced
   by leaf ESI-SIDs, the root B-MACs are replaced by root ESI-SIDs.  The
   PBB encapsulation is replaced by SRv6 encapsulation, the B-component
   is replaced by the underlay GRT.  The B-MAC Advertisement Route is
   replaced by EAD/EVI route or EAD/ES Route.






Wang & Chen               Expires 3 March 2022                 [Page 19]

Internet-Draft                  EVPN-lite                    August 2021


   As illustrated in Figure 3, the root AC-SID and leaf AC-SID of the
   same AC can be considered as the same ESI-SID with different Arg.ACI.
   Even the EGD part of their Arg.ACIs are the same EGD, only the L bit
   of their Arg.ACIs are different.  The L bit of the leaf AC-SID is set
   to 1.  The L bit of the root AC-SID is set to 0.

   On the ingress PE, when the L bit of the destination SID for the DMAC
   of a data packet is 1, and that data packet's ingress AC is a leaf
   AC, that data packet should be dropped.

5.7.  MAC-Synchronization in All-Active Mode

   When a host H1 of subnet SN1 sends an ARP Request REQ_P1, then REQ_P1
   will be forwarded by EVC1 to either PE1 or PE2, not to the both.  But
   when H3 send an ARP Reply REP_P2 to H1, then PE3 may load-balance
   REP_P2 to either PE1 or PE2, not to the both.

   When REQ_P1 is load-balanced (see Appendix A.2.1.1) by EVC1 to PE1,
   not to PE2, but PE3 load-balance REP_P2 to PE2, The MAC entry of H1
   would not have been prepared on PE2 for REP_P2.  So the fowarding of
   REP_P2 will follow the unknown-unicast procedures.

   PE1 MUST use RT-2 route RT2S (RT-2 for Synchronization only) to
   advertise the MAC/IP entry of H1 to other PEs (e.g.  PE2) on ES21.
   These RT-2 routes should be advertised along with an EVI-RT
   ([I-D.ietf-bess-evpn-igmp-mld-proxy]) and an ES-Import RT.

   When PE2 receives RT2S, the MAC entry Mx should be installed with AC2
   as its actual outgoing-interface.  When PE3 receives RT2S, RT2S MUST
   not be imported into VN-10 because that the ES-Import RT of RT2S can
   be resolved to a local ES of PE3.

   As a result of that, the synchronized MAC entries will not be
   imported by their external remote PEs, they are imported just by
   their internal remote PEs.

   The IP address field of NLRI of RT2S can be set to H1's IP address,
   which is obtained through ARP snooping.  This IP address can be used
   to trigger ARP probing when PE1 fails.

   When C-MAC Mx is aged out by PE1, the RT2S MUST be withdrawn, thus
   PE2's MAC entry of Mx will be deleted.  In such case, ARP probing for
   Mx should not be triggered in order not to hold a MAC entry for Mx
   when Mx will not connect to others for a long time.







Wang & Chen               Expires 3 March 2022                 [Page 20]

Internet-Draft                  EVPN-lite                    August 2021


   Note that in other light-weighted EVPNs, the VUN1 may be a backbone-
   VPLS (B-VPLS), in such case, the IP address field of NLRI can be used
   to distinguish the RT-2 routes of C-MACs from the RT-2 routes of
   B-MACs.

5.8.  EVPN IRB Support Considerations

   The dataplane in this draft is no more complex than typical SRv6
   EVPN.  So it will work as efficient as we should expect in SRv6 EVPN
   IRB usecase.

5.8.1.  EVPN IRB Extended Mobility

   In EVPN IRB usecase, [I-D.ietf-bess-evpn-irb-extended-mobility]
   defines some optional extensions to support some specific IRB
   usecases.  In these specific IRB usecases, the <MAC,IP> bindings will
   change across VM-moves.  These extensions can't be applied to light-
   weighted EVPNs, just like they can't be applied to PBB EVPNs either.

5.8.2.  Anycast IRB interfaces

   When an EVPN IRB interface (on PE1) ping a host H1, the corresponding
   ICMP Echo Request will be delivered to host H1, whether host H1 is
   PE1's local host or not . but if that IRB interface is an anycast IRB
   interface, and host H1 is a local host of PE2 (not of PE1), naturally
   the Echo Reply for that Echo Request will be delivered to the nearest
   anycast IRB interface on PE2 (not on PE1) only.

5.8.2.1.  Constructing GW-list

   The MAC/IP of an anycast IRB interface should be advertised along
   with a Default Gateway Extended Community.

   The PEs in which resides the anycast IRB interface of a subnet forms
   the "GW-list" of that subnet.  The "GW-list" of a BD can be
   constructed from such MAC/IP routes (with Default Gateway extended
   community of corresponding subnet).

5.8.2.2.  Flooding over GW-list

   Echo Replies received by any of the anycast IRB interfaces MUST be
   flooded over the GW-list of that BD.  So that the PE which originated
   the previous Echo Request can receive the synced Echo Replies.

   Note that the Echo Replies between two hosts of that BD will not be
   flooded, because that they will not be received by any of the anycast
   IRB interfaces.




Wang & Chen               Expires 3 March 2022                 [Page 21]

Internet-Draft                  EVPN-lite                    August 2021


6.  Comparison with Other Solutions

6.1.  Detailed Comparisons with PBB EVPN over SRv6

   The "PBB EVPN over SRv6 underlay" solution will be complex, if we
   address too much things to it.  I have some examples in the
   following:

   *  The upper-layer header for SRv6 is the PBB-header for B-MACs, not
      the ethernet header for C-MACs, so the SID list (SR-Path or
      network programming Instructions) in the SRH can't be constructed
      for the sake of the I-Component.  For example, when a SRv6 SID for
      MAC-guarding (or something else, just an example) present in the
      SRH for PBB EVPN SRv6, I think it means BMAC-guarding, no C-MAC
      guarding.

   *  The B-MACs for the all-active ESIs can't be aggregated, but the
      SRv6 SIDs for ESIs can be aggregated.  The underlay can advertise
      the ESI-Locators only, so the burden of the underlay network may
      not be increased too much.  When the underlay routes is
      aggregated, the C-MACs can also be learnt against /128 source-IP,
      it is the advantage of a light-weighted SRv6 EVPN, which can't be
      gained from a PBB header.

   *  The B-MACs are for overlay protection (the real overlay is the
      I-VPLS, but the B-VPLS is also an overlay network from the
      viewpoint of the SRv6 network).  But the SRv6 SIDs for ESIs will
      be for underlay protection, it works like the egress protection.
      They are two different types of protecting solutions.

   *  Light-weighted SRv6 EVPN can support AC-influenced DF Election,
      but PBB EVPN over SRv6 can't.

   *  Although PBB EVPN can be transplanted into SRv6 networks along
      with the PBB header (say PBB EVPN over SRv6), It seems to be more
      complicated to me.  Take the EVPN IRB usecases for example, that
      requires seven sequences of header processing, like (SRv6/B-MAC/C-
      MAC)(Inner-IP)(C-MAC/B-MAC/SRv6), during the overlay L3
      forwarding.  I think it will be horrible enough for some ASICs to
      implement it.  When the processing is simplified as (SRv6/C-
      MAC)(Inner-IP)(C-MAC/SRv6), it sounds like a step forward, not
      backward, IMHO.  We can achieve this goal easily inside the EVPN
      framework, only if the data-plane learning can still be considered
      as an option after PBB EVPN.







Wang & Chen               Expires 3 March 2022                 [Page 22]

Internet-Draft                  EVPN-lite                    August 2021


   Fortunately, SRv6 is just too young to have a transplantation of PBB
   EVPN.  So it will waste nothing for the SRv6 nodes to give up the PBB
   header that is never used by these SRv6 nodes.  Note that the SRv6
   functions (End.DT2U and End.DT2M) for L2VPNs have source-IP-based
   data-plane learning for a long time already.

   Although the extensions in [I-D.ietf-bess-evpn-irb-extended-mobility]
   can't be applied to PBB EVPNs or light-weighted EVPNs.  This will not
   prevent PBB EVPNs and light-weighted EVPNs from supporting typical
   IRB use-cases.  Note that these extensions are optional.

6.2.  Detailed Comparisons with Anycast Node SID

   Note that SRv6 Anycast Node SID is the ultimate aggregation of ESI
   SIDs.  Such ESI SID aggregation will have some problems as described
   in Section 5.1.

7.  Security Considerations

   Security considerations will be added in future versions.

8.  IANA Considerations

8.1.  End.DX2AGG SID

   IANA is requested to allocate a new code points for the new SRv6
   Endpoint Behaviors defined in this document.

                  +------+-------------+---------------+
                  | Type | Description | Reference     |
                  +------+-------------+---------------+
                  | TBD1 | End.DX2AGG  | This Document |
                  +------+-------------+---------------+


                            Figure 4: End.DX2AGG

9.  Acknowledgements

   The authors would like to thank the following for their comments and
   review of this document:

   Ye Shu.

10.  Normative References






Wang & Chen               Expires 3 March 2022                 [Page 23]

Internet-Draft                  EVPN-lite                    August 2021


   [I-D.ietf-bess-evpn-igmp-mld-proxy]
              Sajassi, A., Thoria, S., Mishra, M. P., Drake, J., and W.
              Lin, "IGMP and MLD Proxy for EVPN", Work in Progress,
              Internet-Draft, draft-ietf-bess-evpn-igmp-mld-proxy-12, 23
              August 2021, <https://datatracker.ietf.org/doc/html/draft-
              ietf-bess-evpn-igmp-mld-proxy-12>.

   [I-D.ietf-bess-evpn-unequal-lb]
              Malhotra, N., Sajassi, A., Rabadan, J., Drake, J.,
              Lingala, A., and S. Thoria, "Weighted Multi-Path
              Procedures for EVPN Multi-Homing", Work in Progress,
              Internet-Draft, draft-ietf-bess-evpn-unequal-lb-14, 14 May
              2021, <https://datatracker.ietf.org/doc/html/draft-ietf-
              bess-evpn-unequal-lb-14>.

   [I-D.ietf-bess-srv6-services]
              Dawra, G., Filsfils, C., Talaulikar, K., Raszuk, R.,
              Decraene, B., Zhuang, S., and J. Rabadan, "SRv6 BGP based
              Overlay Services", Work in Progress, Internet-Draft,
              draft-ietf-bess-srv6-services-07, 11 April 2021,
              <https://datatracker.ietf.org/doc/html/draft-ietf-bess-
              srv6-services-07>.

   [I-D.sajassi-bess-evpn-ac-aware-bundling]
              Sajassi, A., Brissette, P., Mishra, M., Thoria, S.,
              Rabadan, J., and J. Drake, "AC-Aware Bundling Service
              Interface in EVPN", Work in Progress, Internet-Draft,
              draft-sajassi-bess-evpn-ac-aware-bundling-04, 11 July
              2021, <https://datatracker.ietf.org/doc/html/draft-
              sajassi-bess-evpn-ac-aware-bundling-04>.

   [I-D.wang-bess-evpn-ac-df-per-evi]
              Wang, Y., "AC-Influenced DF Election per EVI", Work in
              Progress, Internet-Draft, draft-wang-bess-evpn-ac-df-per-
              evi-00, 7 May 2021,
              <https://datatracker.ietf.org/doc/html/draft-wang-bess-
              evpn-ac-df-per-evi-00>.

   [I-D.wang-bess-evpn-ether-tag-id-usage]
              Wang, Y., "Ethernet Tag ID Usage Update for Ethernet A-D
              per EVI Route", Work in Progress, Internet-Draft, draft-
              wang-bess-evpn-ether-tag-id-usage-03, 26 August 2021,
              <https://datatracker.ietf.org/doc/html/draft-wang-bess-
              evpn-ether-tag-id-usage-03>.







Wang & Chen               Expires 3 March 2022                 [Page 24]

Internet-Draft                  EVPN-lite                    August 2021


   [RFC7432]  Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
              Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
              Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
              2015, <https://www.rfc-editor.org/info/rfc7432>.

   [RFC7623]  Sajassi, A., Ed., Salam, S., Bitar, N., Isaac, A., and W.
              Henderickx, "Provider Backbone Bridging Combined with
              Ethernet VPN (PBB-EVPN)", RFC 7623, DOI 10.17487/RFC7623,
              September 2015, <https://www.rfc-editor.org/info/rfc7623>.

   [RFC8317]  Sajassi, A., Ed., Salam, S., Drake, J., Uttaro, J.,
              Boutros, S., and J. Rabadan, "Ethernet-Tree (E-Tree)
              Support in Ethernet VPN (EVPN) and Provider Backbone
              Bridging EVPN (PBB-EVPN)", RFC 8317, DOI 10.17487/RFC8317,
              January 2018, <https://www.rfc-editor.org/info/rfc8317>.

   [RFC8584]  Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake,
              J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet
              VPN Designated Forwarder Election Extensibility",
              RFC 8584, DOI 10.17487/RFC8584, April 2019,
              <https://www.rfc-editor.org/info/rfc8584>.

   [RFC8986]  Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer,
              D., Matsushima, S., and Z. Li, "Segment Routing over IPv6
              (SRv6) Network Programming", RFC 8986,
              DOI 10.17487/RFC8986, February 2021,
              <https://www.rfc-editor.org/info/rfc8986>.

11.  Informative References

   [I-D.ietf-bess-evpn-irb-extended-mobility]
              Malhotra, N., Sajassi, A., Pattekar, A., Rabadan, J.,
              Lingala, A., and J. Drake, "Extended Mobility Procedures
              for EVPN-IRB", Work in Progress, Internet-Draft, draft-
              ietf-bess-evpn-irb-extended-mobility-05, 15 March 2021,
              <https://datatracker.ietf.org/doc/html/draft-ietf-bess-
              evpn-irb-extended-mobility-05>.

   [I-D.wang-bess-evpn-egress-protection]
              Wang, Y. and R. Chen, "EVPN Egress Protection", Work in
              Progress, Internet-Draft, draft-wang-bess-evpn-egress-
              protection-04, 29 October 2020,
              <https://datatracker.ietf.org/doc/html/draft-wang-bess-
              evpn-egress-protection-04>.







Wang & Chen               Expires 3 March 2022                 [Page 25]

Internet-Draft                  EVPN-lite                    August 2021


   [RFC7041]  Balus, F., Ed., Sajassi, A., Ed., and N. Bitar, Ed.,
              "Extensions to the Virtual Private LAN Service (VPLS)
              Provider Edge (PE) Model for Provider Backbone Bridging",
              RFC 7041, DOI 10.17487/RFC7041, November 2013,
              <https://www.rfc-editor.org/info/rfc7041>.

   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
              eXtensible Local Area Network (VXLAN): A Framework for
              Overlaying Virtualized Layer 2 Networks over Layer 3
              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
              <https://www.rfc-editor.org/info/rfc7348>.

   [RFC8365]  Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R.,
              Uttaro, J., and W. Henderickx, "A Network Virtualization
              Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365,
              DOI 10.17487/RFC8365, March 2018,
              <https://www.rfc-editor.org/info/rfc8365>.

Appendix A.  Explanation for Physical Links of the Use-cases

                                       +------------------+
                             PE1       | P6               |
               L2NE1        +----------+---------+        |
            +----------+    |  __(P1.1)__(VPNx)  |        |
   +---+ P4 |          | P1 | /            \     |        |
   |N1 |-----O==------=======<             (NIz) |     P6 |    PE3
   +---+    |   \    / |    | \__      __  /     |   +----+-------+
            |    |  |  |    |    (P1.2)  (VPNy)  |   |            |
            +----|P3|--+    +-----------+--------+   |      (VPNx)--+N3
                 |  |                   |            |      /     |
            P3.1 |  | P3.2              | P7         | (NIz)--------+N4
                 |  |        PE2        |            |      \     |
            +----|P3|--+    +-----------+--------+   |      (VPNy)--+N5
            |     \/   |    |  __(P2.2)__(VPNy)  |   |            |
   +---+    |     /\   |    | /            \     |   +----+-------+
   |N2 |-----O====--=========<            (NIz)  |     P8 |
   +---+ P5 |          | P2 | \__      __  /     |        |
            +----------+    |    (P2.1)  (VPNx)  |        |
               L2NE2        +----------+---------+        |
                                       | P8               |
                                       +------------------+

                    Figure 5: Physical Links Illustrated







Wang & Chen               Expires 3 March 2022                 [Page 26]

Internet-Draft                  EVPN-lite                    August 2021


   There are three PEs, two L2NEs (Layer 2 Network Elements) and five
   L3NEs (Layer 3 Network Elements) in abobe network.  The PEs are PE1,
   PE2 and PE3.  The L2NEs are L2NE1 and L2NE2.  The L3NEs are
   N1/N2/N3/N4/N5.  They are all illustrated in Figure 5.

   There are 9 physical links among these 10 physical devices as
   illustrated in Figure 5.  These physical links are called as PLi
   (i=1,2...8).  The two physical ports of the same physical link PLi
   are both called as Pi (i=1,2...8).

   As illustrated in Figure 5, some of these physical ports may have
   subinterfaces.  When a subinterface's VLAN ID is j and it is physical
   port Pi's subinterface, that subinterface is called as Pi.j.  For
   example, P1.2 is a subinterface of physical port P1 and its VLAN ID
   is 2.

   There are three NIs (Network Instances) among PE1, PE2 and PE3.  They
   are VPNx, VPNy and NIz.  Two subinterfaces are attached to VPNx, they
   are P1.1 and P2.1.  Other two subinterfaces are attached to VPNy,
   they are P1.2 and P2.2.  N3 is also attched to VPNx, while N5 is also
   attached to VPNy.

   There are two EVCs (Ethernet Virtual Connections) between L2NE1 and
   L2NE2, they are EVC1 and EVC2.  The L2NE1's EVC1 instance (which is
   illustrated as the "O" on L2NE1) have three member interfaces, they
   are P4, P1.1 and P3.1, where P3.1 and P1.1 are of the same
   protection-group.  The L2NE2's EVC1 instance have two member
   interfaces, they are P3.1 and P2.1.  The L2NE2's EVC2 instance (which
   is illustrated as the "O" on L2NE2) have three member interfaces,
   they are P5, P2.2 and P3.2, where P3.1 and P1.1 are of the same
   protection-group.  The L2NE1's EVC2 instance have two member
   interfaces, they are P3.2 and P1.2.  The L2NE2's EVC1 instance and
   L2NE1's EVC2 instance are both CCC (Circuit Cross Connection) local
   connections.

   VPNx and VPNy are associated to NIz on each PE.

A.1.  Failure Detections for P1.2 (or P2.1)

   There is a CFM session CFM1 between P1.2 of PE1 and L2NE2's P3.2,
   when physical port P3 fails, the CFM session CFM1 will go down.
   There is a CFM session CFM2 between P2.1 of PE2 and L2NE1's P3.1,
   when physical port P3 fails, the CFM session CFM2 will go down.

A.2.  Protection Approaches for N1 (or N2)






Wang & Chen               Expires 3 March 2022                 [Page 27]

Internet-Draft                  EVPN-lite                    August 2021


A.2.1.  CCC-Approaches

   The L2NE1's EVC1 instance and L2NE2's EVC2 instance are both CCC
   local connections too.  In L2NE1's EVC1 instance, P1.1 and P3.1 are
   of the same protection-group PG1.  In L2NE2's EVC2 instance, P2.2 and
   P3.2 are of the same protection-group PG2.  In PG1, both P1.1 and
   P3.1 will receive data packets.  In PG2, both P2.2 and P3.2 will
   receive data packets.

A.2.1.1.  CCC Active-Active Protection

   L2NE1 (or L2NE2) will load-balance N1's (N2's) data packets between
   P1.1 and P3.1 (or P2.2 and P3.2).

A.2.1.2.  CCC Active-Standby Protection

   In PG1, P1.1 is the active path, P3.1 is the backup path.  In PG2,
   P2.2 is the active path, P3.2 is the backup path.

   That's saying that L2NE1 (or L2NE2) will not send N1's (or N2's) data
   packets over P3.1 (or P3.2), unless P1.1 (or P2.2) or P1 (or P2) has
   been in failure before that data forwarding.

A.2.2.  VSI-Approaches

   L2NE1's EVC2 instance and L2NE2's EVC1 instance are both VSI
   instances in this case.  P1.1, P3.1, P2.2 and P3.2 are all individual
   ACs in these VSIs.

   Note that L2NE2's EVC1 instance and L2NE1's EVC2 instance are still
   both CCC local connections in this case, and there is no PG1 or PG2
   in this case, and there are no PWs in this case.

Authors' Addresses

   Yubao Wang
   ZTE Corporation
   No.68 of Zijinghua Road, Yuhuatai Distinct
   Nanjing
   China

   Email: wang.yubao2@zte.com.cn









Wang & Chen               Expires 3 March 2022                 [Page 28]

Internet-Draft                  EVPN-lite                    August 2021


   Ran Chen
   ZTE Corporation
   No. 50 Software Ave, Yuhuatai Distinct
   Nanjing
   China

   Email: chen.ran@zte.com.cn












































Wang & Chen               Expires 3 March 2022                 [Page 29]