Internet DRAFT - draft-zhang-trill-aggregation

draft-zhang-trill-aggregation



 



INTERNET-DRAFT                                              Mingui Zhang
Intended Status: Proposed Standard                       Donald Eastlake
Expires: February 23, 2014                                        Huawei
                                                         August 22, 2013

              Problem Statement: TRILL Active/Active Edge
                  draft-zhang-trill-aggregation-04.txt

Abstract

   This document specifies TRILL active/active edge which allows
   multiple RBridges concurrently forward data frames of the same VLAN
   on links bundled by a Multi-Chassis Link Aggregation Group. With this
   kind of connection, end nodes may increase the bandwidth and
   reliability of the access at the edge of TRILL campuses. It's
   required that no loop or duplication is caused by this new connection
   type. Besides this basic requirement, this document outlines other
   potential issues associated with TRILL active/active edge and
   investigates how these issues may be addressed.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright and License Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
 


Mingui Zhang, et al    Expires February 23, 2014                [Page 1]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2. Acronyms and Terminology  . . . . . . . . . . . . . . . . . . .  3
     2.1. Acronyms  . . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . .  4
   3. Overview  . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
   4. Frame Processing  . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1. Unicast Ingressing  . . . . . . . . . . . . . . . . . . . .  6
     4.2. Unicast Egressing . . . . . . . . . . . . . . . . . . . . .  6
     4.3. Multicast Ingressing  . . . . . . . . . . . . . . . . . . .  6
     4.4. Multicast Egressing . . . . . . . . . . . . . . . . . . . .  6
   5. DRB and Pseudonode  . . . . . . . . . . . . . . . . . . . . . .  7
   6. MAC Addresses Sharing . . . . . . . . . . . . . . . . . . . . .  8
   7. Failures and Self-healing . . . . . . . . . . . . . . . . . . .  9
     7.1. Link Failure  . . . . . . . . . . . . . . . . . . . . . . .  9
     7.2. Node Failure  . . . . . . . . . . . . . . . . . . . . . . .  9
   8. Reverse Path Forwarding Check . . . . . . . . . . . . . . . . .  9
   9. Security Considerations . . . . . . . . . . . . . . . . . . . . 11
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 11
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 11
     11.2. Informative References . . . . . . . . . . . . . . . . . . 11
   Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12














 


Mingui Zhang, et al    Expires February 23, 2014                [Page 2]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


1. Introduction

   TRILL makes use of the ISIS link state routing to provide least cost
   paths between TRILL switches (a.k.a. Routing Bridge, RBridge). When a
   multi-access LAN link connects end-stations to multiple RBridges, a
   single RBridge has to be appointed as the frame forwarder for each
   VLAN-x on this LAN link. Other RBridges MAY be appointed as frame
   forwarders for other VLANs but MUST be inhibited from forwarding
   frames for the same VLAN-x on this LAN link [RFC6349].

   An MC-LAG can also be used to connect end-stations to multiple
   RBridges. There are two possible scenarios: (a) an end-station is
   connected to multiple RBridges by an MC-LAG directly; (b) end-
   stations are attached to a bridge and this bridge uses an MC-LAG to
   connect multiple RBridges. An MC-LAG may choose any component link to
   forward frames and never forwards between them. Therefore, it
   requires the up-connected RBridges to provide active/active
   attachment instead of the active/standby mode adopted in the
   Appointed Forwarder mechanism [RFC6349]. This kind of attachment
   allows end nodes increase the bandwidth and reliability of their
   access to the TRILL campus via MC-LAG.

   Similar as a LAN link, an MC-LAG can be represented by a pseudonode.
   All member RBridges should report their adjacencies to this
   pseudonode using LSPs. In this way, RBridges attached to the same MC-
   LAG forms an active/active edge group. Other RBridges in the campus
   communicate with this pseudonode using forwarding paths computed
   according to ISIS link state routing. No additional add-on
   characteristics are required.

   The baseline requirement is that the active/active edge MUST provide
   frame forwarding without causing loops or duplications to TRILL
   campus and the end node. In order to work properly, the TRILL
   active/active edge has to conduct several other issues. The purpose
   of this document is to outline these issues while specific solutions
   to address them are to be explored in the future as building blocks
   of the whole TRILL active/active edge mechanism. 

   The rest of this document is organized as follows. Section 2 gives
   acronyms and terminology. Section 3 provides an overview. Section 4
   specifies the frame processing behaviors of member RBridges. Section
   5 describes how pseudonode is set up. Section 6 explains the MAC
   sharing among member RBridges. Section 7 describes the self-healing
   issue. Section 8 investigates how to go through Reverse Path
   Forwarding Check without packet loss.

2. Acronyms and Terminology

 


Mingui Zhang, et al    Expires February 23, 2014                [Page 3]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


2.1. Acronyms

   MC-LAG: Multi-Chassis Link Aggregation Group
   ISIS: Intermediate System to Intermediate System
   TRILL: TRansparent Interconnection of Lots of Links
   AF: Appointed Forwarder
   DT: Distribution Tree
   RPFC: Reverse Path Forwarding Check

2.2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   In this document, the term "end node" means the end station or bridge
   connected to the TRILL active/active edge by MC-LAG.

   Familiarity with [RFC6325], [RFC6327], and [RFC6349] is assumed in
   this document. As in [RFC6325], in this document the word "link"
   means a "bridged LAN", unless otherwise qualified. 

3. Overview

   If an end node (end station or bridge) uses an MC-LAG to connect
   multiple edge RBridges, it's expected that all these RBridges can
   ingress and egress frames for the end node. In contrast, if multiple
   RBridges are connected to a LAN link, only one of them can be
   appointed as the frame forwarder for each VLAN-x [RFC6349], as
   illustrated in Figure 2.1 (a). Other RBridges will be inhibited from
   ingressing and egressing frames for VLAN-x.

















 


Mingui Zhang, et al    Expires February 23, 2014                [Page 4]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


             +-----+                   +-----+
             | RBi |                   | RBi |(Remote RBridge)
             +-----+                   +-----+
           /\/\/\/\/\/\              /\/\/\/\/\/\
          /   Transit  \            /   Transit  \
         <    RBridges  >          <    RBridges  >
          \            /            \            /
           \/\/\/\/\/\/              \/\/\/\/\/\/
            |        |                |        |
         +-----+  +-----+          +-----+  +-----+
         | RB1 |--| RB2 |          | RB1 |--| RB2 |(Active/Active Edge)
         +-----+  +-----+          +-----+  +-----+
            AF\    /                     \   /
               +---+                    *******
               |LAN|                    * RBv * (Virtual RBridge)
               +---+                    *******
                                          | |(MC-LAG)
                                         +---+
                                         | E |
                                         +---+
         (a) Appointed Forwarder   (b) Active/Active Edge

      Figure 2.1: TRILL Appointed Forwarder vs Active-Active Edge

   As illustrated in Figure 2.1 (b), The end node 'E' are attached to
   both RB1 and RB2 using an MC-LAG. Each member RBridge can ingress and
   egress frames for the end node for VLAN-x. If each of them uses its
   own nickname as the ingress nickname, the remote RBridge may observe
   different locations for one MAC address at different time, which is
   referred as the "MAC move" problem in this document. The MAC move
   problem affects the path selection at the remote RBridge. Frames
   destined to the end node may go through different paths, which may
   cause frame disorder of a traffic flow. 

   In order to avoid the MAC move problem, each member RBridge should
   use a uniform nickname as the ingress nickname in TRILL data frame
   encapsulation. As shown in Figure 2.1 (b), member RBridges pretend
   there is an virtual RBridge connected to them, acting as the
   appointed forwarder of the end node. It is naturally to denote this
   virtual RBridge as a pseudonode. All RBridges connected to the MC-LAG
   forms adjacencies with the pseudonode. Other RBridges believe there
   is an RBridge RBv connecting RB1, RB2. Note that member RBridges
   SHOULD NOT announce they are VLAN-x Appointed Forwarder if VLAN-x is
   enabled on the MC-LAG.

   Although the above example includes two edge RBridges, the TRILL
   active/active edge solution SHOULD support cases with more than two
   member RBridges.
 


Mingui Zhang, et al    Expires February 23, 2014                [Page 5]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


4. Frame Processing

   When the end node injects frames into the TRILL campus via a member
   RBridge, this RBridge encapsulates the native frames on behalf of the
   pseudonode. When frames are sent to the end node, the pseudonode is
   supposed to be the egress RBridge. It's REQUIRED that RBridges other
   than the active/active members are not aware of the active/active
   group and need not change their frame processing behavior.

   Compared to the Appointed Forwarder mechanism, all active/active
   member RBridges are able to ingress and egress frames of VLAN-x on
   the same link. It is crucial to avoid loops and duplications in the
   frame processing.

4.1. Unicast Ingressing

   Receiver RBridges encapsulate native frames using the nickname of the
   pseudonode as the ingress nickname. When these TRILL data frames
   arrive at the remote RBridge, the MAC addresses will be learnt from
   packet decapsulation. The remote RBridge will regard the pseudonode
   as the egress RBridge for these MAC addresses. 

4.2. Unicast Egressing

   As learnt in the MAC table, TRILL data frames from remote RBridges
   destined to the end node will be sent to the pseudonode rather than
   member RBridges. If member RBridges receive TRILL data frames whose
   egress RBridge is the pseudonode, they can judge that these frames
   should be egressed onto the MC-LAG.

   However, member RBridges MUST NOT egress any TRILL data frames whose
   ingress RBridge is the pseudonode. Otherwise, loops will happen.

4.3. Multicast Ingressing

   The end node chooses one component link of the MC-LAG to send
   multicast frames to member RBridges. Similar as the unicast
   ingressing, the receiver RBridge encapsulate the native frames using
   the nickname of the pseudonode as the ingress nickname.

   Different member RBridges MUST NOT share the same Distribution Tree
   to ingress a multicast frame of a specific VLAN-x from the end node.
   Otherwise, some multicast frames may suffer from loss due to Reverse
   Path Forwarding Check. This issues is detailed in Section 8. 

4.4. Multicast Egressing

   Multicast frames sent along the VLAN-x Distribution Tree may reach
 


Mingui Zhang, et al    Expires February 23, 2014                [Page 6]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


   all member RBridges. However, only one of them can egress the
   multicast frames onto the MC-LAG. Otherwise, the end node will suffer
   from frame duplication. This requirement can be met if member
   RBridges calculate the Distribution Tree regarding the pseudonode as
   a normal RBridge. Then only one parent RBridge will be selected for
   the pseudonode. Other non-parent member RBridges MUST refrain from
   egressing multicast frames of VLAN-x onto the MC-LAG.

   Similar as the unicast egressing, member RBridges MUST NOT egress any
   multicast frames whose ingress RBridge is the pseudonode.

5. DRB and Pseudonode

   As we know, a DRB MAY give a pseudonode name to a LAN link, issue an
   LSP (Link State PDU) on behalf of the pseudonode, and issues CSNPs
   (Complete Sequence Number PDUs) on the LAN link [RFC6325]. Different
   from a LAN link, there is no HELLO exchanging on the MC-LAG. Thus,
   the DRB cannot be elected using HELLO protocol. Member RBridges MAY
   establish a dedicated RBridge Channel to discover each other and
   elect the DRB (DRB for active/active RBridge group, aDRB) to execute
   the above tasks: to assign the nickname and issue LSP and CSNPs. The
   member RBridge with the highest priority to be the tree root is a
   good choice.

   Member RBridges SHOULD be able to discover each other to resolve
   misconfiguration and failures. Each member RBridge SHALL report their
   connection to the MC-LAG. The MAC address of the end node MAY be used
   to identify the MC-LAG to which the member RBridges are connected.

   One RBridge may be connected to multiple MC-LAGs. It's probably that
   all these MC-LAGs share the same set of member RBridges. However,
   these MC-LAGs MUST NOT share the same pseudonode, otherwise it can
   cause the following issue.

   o Component Links from Different MC-LAGs Cannot be Distinguished:
     Assume member RBridge RBi is connected to multiple end nodes and
     these links are all advertised as a single ISIS link "RBi-RBv".
     Remote RBridges cannot distinguish these links connecting RBi and
     RBv. When one of these links fails, it becomes problematic. On one
     hand, if the failed link is not advertised as a down ISIS link,
     traffic sent from remote RBridges to RBv via the failed link will
     be trapped by blackholing. On the other hand, if the failed link is
     announced as a down ISIS link. Component links from other MC-LAGs
     will be disconnected mistakenly.

   The right choice is to represent every MC-LAG as a unique pseudonode.
   In this way, the failure of a component link of an MC-LAG can be
   interpreted as an ISIS link failure. Thus the aDRB can issue a new
 


Mingui Zhang, et al    Expires February 23, 2014                [Page 7]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


   LSP on half of the pseudonode to trigger the link state update across
   the campus. 

6. MAC Addresses Sharing

   When a member RBridge learns a MAC address from the encapsulation or
   decapsulation of a TRILL data frame, it SHOULD share this learning
   among all member RBridges. Afterwards, a frame destined to this MAC
   address can be delivered to the MC-LAG or ingressed to the TRILL
   campus by any other member RBridge as a unicast native frame or TRILL
   data frame. 

   a) Northbound Sharing: When a remote RBridge chooses the path to send
      data frames to the end node, these frames may arrive at anyone of
      the member RBridges, given that member RBridges may be on the
      Equal Cost Multiple Paths from the remote RBridge to the
      pseudonode. If the MAC address from the end node was learnt and
      recorded by any member RBridge before. The receiver RBridge SHOULD
      have recorded this MAC (VLAN ID, MAC Address, Port Number) as
      well, so that the frame can be delivered as a known unicast to the
      end node. Therefore, local MAC addresses learnt from data frames
      sent by the end node (northbound) SHOULD be shared among member
      RBridges.

   b) Southbound Sharing: The end node may choose any component link to
      inject a frame, which achieves load-balance on the MC-LAG. If the
      destination MAC address has been learnt by any member RBridge, the
      receiver RBridge SHOULD also hold that MAC record (VLAN ID, MAC
      Address, Egress RBridge Nickname). Thus the data frame need not be
      sent as a multicast frame (unknown unicast). Therefore, MAC
      addresses learnt from data frames sent by remote RBridges to the
      end node (southbound) should be shared as well. 

   When an RBridge learns a source MAC address from a data frame, it
   will record the VLAN ID, the source MAC address and location which
   can be the incoming port number or the ingress nickname. A MAC
   address shared by a peer RBridge is recorded as if it is locally
   learned. For example, when RB1 shares a MAC with RB2, RB2 should set
   the incoming port as its port attaching to the end node.

   It is REQUIRED that all member RBridges set the same aging time for
   each MAC address. Every time a MAC address is learnt or updated, all
   member RBridges MUST update the record and reset its aging time. It's
   probably that data frames from one source MAC are received
   continuously. There is no problem to update the entry of this MAC
   locally. However, when this update is executed among multiple member
   RBridges, the intensive updates may consume a considerable bandwidth.
   Therefore, member RBridges need a communication channel to realize
 


Mingui Zhang, et al    Expires February 23, 2014                [Page 8]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


   the MAC sharing, which can be realized through the extension of ESADI
   or using a dedicated RBridge Channel [Channel].

7. Failures and Self-healing

   Resilience is a major purpose that the active/active edge aims to
   achieve. From the side of the end node, the MC-LAG provides
   reliability of the access link. From the side of the member RBridges,
   the state change of the active/active edge caused by link or node
   failures is reflected by the update of LSPs of member RBridges. This
   provides self-healing of the active/active edge.

7.1. Link Failure

   The failure of a component link of the MC-LAG link is translated into
   an ISIS link failure: if a member RBridge is disconnected from the
   end node, it will send out an LSP to announce that it is not
   connected to the pseudonode. This will trigger the update of
   forwarding tables of remote RBridges. Since other member RBridges
   have also reported the connection to the pseudonode, remote RBridges
   in the TRILL campus can send frames to the pseudonode via any other
   member RBridge. Therefore, the reach-ability to the end node is not
   broken by this link failure.

   If the link connecting the aDRB and the end node fails, the link
   failure will trigger the election of aDRB. The new aDRB SHOULD reuse
   the nickname allocated to the pseudonode, which avoids changing the
   locations of MAC addresses from the end node learnt by remote
   RBridges.

   The extreme case is that the last component link of the MC-LAG fails.
   Then the aDRB SHOULD update its LSPs to remove the pseudonode from
   the campus, which also destroys the whole active/active edge.

7.2. Node Failure

   The node failure of member RBridges will also be reflected by LSP
   announcement. If the aDRB fails, a new aDRB will be elected and this
   new aDRB SHOULD reuse the nickname of the pseudonode allocated by the
   old aDRB. 

8. Reverse Path Forwarding Check

   Reverse Path Forwarding Check (RPFC) is used by TRILL to suppress
   forwarding loops of multicast frames [RFC6325]. For a specific
   Distribution Tree (DT), a multicast frame from a specific ingress
   RBridge can arrive at only one expected link of an RBridge. RBridges
   MUST drop multicast frames that fail the RPFC [RFC6325]. 
 


Mingui Zhang, et al    Expires February 23, 2014                [Page 9]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


   When multiple member RBridges ingress multicast frames for VLAN-x of
   the end node simultaneously, it can not guarantee that these frames
   always arrive at the expected link of at a remote RBridge. The
   following example explains this issue. 

                                   RBi   
                                  /   \
                                RB1   RB2
                                /
                              RBv

              Figure 7.1: The Distribution Tree, root=RBi

   Suppose a Distribution Tree of Figure 2.1 (b) is constructed as shown
   in Figure 7.1. For this Distributions Tree, multicast frames from RBv
   to RBi is expected to be received at the port attaching to RB1. With
   the active/active connection, RB2 can receive native data frames from
   the MC-LAG as well. If RB2 adopts the above Distribution Tree,
   multicast frames from RBv to RBi will be received at the port
   attaching to RB2. This brings the problem: these frames will be
   discarded according to the rule of RPFC. 

                         RBx                RBy 
                          |                  |
                         RBi                RBi
                        /   \              /   \
                      RB1   RB2          RB1   RB2
                      /                          \
                    RBv                           RBv

                   (a) DT, root=RBx    (b) DT, root=RBy

        Figure 7.2: Assign an Unique Tree to each Member RBridge

   One way to avoid the above issue is to leverage the feature that
   RBridges can compute multiple Distribution Trees. Be sure to assign
   an unique Distribution Tree to each member RBridge for multicast
   frame distribution. Identify these trees using their root RBridge
   nicknames. The example in Figure 7.2 illustrates this method, where
   RB1 and RB2 adopt two different Distribution Trees. 

   Active/active edge need to assign at least one Distribution Tree per
   component link of an MC-LAG, the maximally allowed number of
   component links depends on the number of Distribution Trees that all
   RBridges can compute. However, MC-LAGs of the best current practice
   have two component links, which are well supported by TRILL switches.

   In [CMT], the Affinity TLV is used to achieve the above assignment of
 


Mingui Zhang, et al    Expires February 23, 2014               [Page 10]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


   Distribution Trees to member RBridges. It is REQUIRED that all
   RBridges in the campus are able to recognize the Affinity TLV and
   compute Distribution Trees as this TLV specified. 

   When there is a link or node failure in the active/active edge, the
   failed Distribution Tree should be re-allocated to a new member
   RBridge. It is RECOMMENDED that this re-allocation is incremental. In
   other words, other Distribution Trees not affected by the failure
   SHOULD be retained. 

9. Security Considerations

   This document raises no new security issues for ISIS.

10. IANA Considerations

   This document requires no IANA actions. RFC Editor: please remove
   this section before publication.

11. References 

11.1. Normative References

   [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol
             Specification", RFC 6325, July 2011.

   [RFC6349] R. Perlman, D. Eastlake, et al, "RBridges: Appointed
             Forwarders", RFC 6349, November 2011.

   [Channel] D. Eastlake, V Manral, et al, "TRILL: RBridge Channel
             Support", draft-ietf-trill-rbridge-channel-08.txt, July
             2012, working in progress.

   [CMT]    T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast
             Trees (CMT)for TRILL", draft-ietf-trill-cmt-01.txt,
             November 2012, working in progress.

11.2. Informative References

   None.








 


Mingui Zhang, et al    Expires February 23, 2014               [Page 11]

INTERNET-DRAFT          TRILL Active/Active Edge         August 22, 2013


Author's Addresses


   Mingui Zhang
   Huawei Technologies
   No.156 Beiqing Rd. Haidian District,
   Beijing 100095 P.R. China
   	
   Email: zhangmingui@huawei.com

   Donald E. Eastlake, 3rd
   Huawei Technologies
   155 Beaver Street
   Milford, MA 01757 USA

   Phone: +1-508-333-2270
   Email: d3e3e3@gmail.com


































Mingui Zhang, et al    Expires February 23, 2014               [Page 12]