Internet DRAFT - draft-gredler-idr-bgplu-prefix-sid

draft-gredler-idr-bgplu-prefix-sid







Inter-Domain Routing                                          H. Gredler
Internet-Draft                                    Juniper Networks, Inc.
Intended status: Standards Track                           March 9, 2015
Expires: September 10, 2015


                    Prefix-SID extensions for BGP-LU
                 draft-gredler-idr-bgplu-prefix-sid-00

Abstract

   The MPLS source routing paradigm provides path control for both
   intra- and inter- Autonomous System (AS) traffic.  In most MPLS
   deployments the ingress of a MPLS tunnel is an IP router.
   Availability of MPLS forwarding stacks for host operating systems is
   extending the MPLS perimeter to Hypervisors and Servers.  Recent Data
   Center designs are using an IGP-less routing paradigm based on
   massive ECMP multi path using external BGP.  This documents outlines
   how Hypervisors and Servers may interact with the MPLS control- and
   data plane using extensions to the BGP labeled unicast protocol (BGP-
   LU).

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 10, 2015.







Gredler                Expires September 10, 2015               [Page 1]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Motivation, Rationale and Applicability . . . . . . . . . . .   3
   3.  Deployment Considerations . . . . . . . . . . . . . . . . . .   4
     3.1.  Control plane restart . . . . . . . . . . . . . . . . . .   4
     3.2.  BGP-LU as Server Control Plane  . . . . . . . . . . . . .   5
     3.3.  Labeled-ARP as Server Control Plane . . . . . . . . . . .   5
     3.4.  Static Labels and Controller as Server Control Plane  . .   5
   4.  BGP Prefix-SID Attribute  . . . . . . . . . . . . . . . . . .   5
     4.1.  Label Index TLV . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Label Base TLV  . . . . . . . . . . . . . . . . . . . . .   7
     4.3.  Label Range TLV . . . . . . . . . . . . . . . . . . . . .   7
   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   7
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   8
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   Recent Datacenter routing designs are modeled like shown in
   Figure Figure 1.  Rather than using an IGP plus internal BGP (iBGP),
   an IGP-less design is favored for disseminating routing information.
   See [I-D.ietf-rtgwg-bgp-routing-large-dc] for rationale and detailed
   information why and how to do so.  Today BGP-LU [RFC3107] is used
   both as an intra-AS [I-D.ietf-mpls-seamless-mpls] and inter-AS
   routing protocol.  Because of the IGP-less routing paradigm topology
   information gets lost.  Particularly the ability to direct traffic to
   a specific node and hence the ability to do construct explicit paths
   denominated by a set of nodes for traffic-engineering is of interest.



Gredler                Expires September 10, 2015               [Page 2]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


   BGP-LU today may advertise a MPLS transport path between Autonomous
   Systems.  This document describes extensions to the BGP-LU protocol
   such that in addition to the advertised MPLS label-switched paths
   (LSP) all potential MPLS label-switched paths of any given node in
   the Data Center are exposed to ingress nodes.

   The protocol extensions In this document are in full compliance with
   the MPLS Architecture documented in [RFC3031].

                +------+  +------+
                |      |  |      |
                |      |--|      |           Tier-1 / AS 651xx
                |      |  |      |
                +------+  +------+
                  |  |      |  |
        +---------+  |      |  +----------+
        | +-------+--+------+--+-------+  |
        | |       |  |      |  |       |  |
      +----+     +----+    +----+     +----+
      |    |     |    |    |    |     |    |
      |    |-----|    |    |    |-----|    | Tier-2 / AS 652xx
      |    |     |    |    |    |     |    |
      +----+     +----+    +----+     +----+
         |         |          |         |
         |         |          |         |
         | +-----+ |          | +-----+ |
         +-|     |-+          +-|     |-+    Tier-3 / AS 653xx
           +-----+              +-----+
            | | |                | | |
        <- Servers ->        <- Servers ->  Servers / AS 65534

                Figure 1: eBGP-centric Data Center routing

2.  Motivation, Rationale and Applicability

   The specifications for Segment Routing (
   [I-D.ietf-isis-segment-routing-extensions] and
   [I-D.ietf-ospf-segment-routing-extensions] ) provide extensions for
   setting up hop-by-hop shortest path routed MPLS LSPs.  The used
   Protocol semantics are:

   o  Domain-wide Index

   o  Local Label-Base

   o  Local Label Range





Gredler                Expires September 10, 2015               [Page 3]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


   advertised by any router in an IGP domain.  This not only sets up
   MPLS sink-trees to each egress router in a domain, but also allows to
   steer traffic using stacks of node labels.  The chosen protocol
   semantics are essentially a compression scheme to advertise all MPLS
   SPT paths in a domain.

   The ability to do explicit path routing based on stacked labels,
   constructed at the Hypervisors/Servers, without running conventional
   TE-protocols like for example RSVP-TE is a lightweight way to scale
   the Data Center Fabric.

   In order to support deployments of Segment Routing across routing
   protocol boundaries it is required to keep a common set of semantics
   across all routing protocols.  This document specifies BGP-LU
   extensions to be able to address Node-SIDs across routing-protocol
   boundaries.

3.  Deployment Considerations

   Depending on the Sophistication of the MPLS stack at the Hypervisor /
   Server there are various levels of considerations for deployment.

3.1.  Control plane restart

   In case a restart of the first-hop router needs to be performed there
   may be some forwarding state churn at the Hypervisor / Server.  It
   would be desirable that upon control-plane restart the Network node
   uses the same label-allocations than in the previous incarnation.
   Unfortunately none of the BGP graceful restart extensions allows to
   re-aquire previous incarnations label-mapping state from the network.
   Therefore a restarting node will be allocating FECs to labels in
   temporal incoming order.  This degrades to pseudo-random, non-
   predictable label allocations.  It is desirable that a BGP-LU
   implementation allocates the labels in a deterministic way, such that
   temporal control-plane loss does not impact forwarding between the
   Hypervisor / Server and the network.

   A BGP-LU Prefix SID speaking networking node MUST therefore implement
   a MPLS label-allocation strategy which produces a deterministic,
   local allocated label-block for all of its Prefix SIDs.

   For example an Implementation MAY statically allocate a Label Base of
   800000 and a block-size of 16000 labels and delegate that label block
   exclusively to BGP-LU Prefix SID allocations, such that the same
   label-base is being used across control-plane restarts.






Gredler                Expires September 10, 2015               [Page 4]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


3.2.  BGP-LU as Server Control Plane

   In this case the Hypervisor / Server has a "client-only" BGP-LU stack
   in order to interface to the network.  This is the most distributed
   way of building label switched paths across the network.  As soon as
   there is a reachability change then all of the Hypervisors / Servers
   get notified instantly.  There is almost no time-lag for updating
   servers due to the inherent PUSH model of the BGP Protocol.

   Most of the implementation complexity of a BGP implementation comes
   from the BGP Update generation subsystem.  For a client-only BGP
   implementation this is fortunately negligible as typically one or two
   (for redundancy reasons) BGP sessions are required.  So the BGP
   Update Generation complexity stays limited.

3.3.  Labeled-ARP as Server Control Plane

   The Labeled ARP Protocol [I-D.kompella-mpls-larp] may be used as a
   lightweight alternative to the BGP-LU protocol.  Labeled ARP is a
   soft-state protocol and therefore needs special consideration for e.g
   Refresh-timers, Labels in the network etc.  needs to be taken.  Yet
   it is a distributed variant of LSP state propagation and hence re-
   acts immediately to network topology changes / label to FEC changes.

3.4.  Static Labels and Controller as Server Control Plane

   Static labels do not need a control-plane sessions between
   Hypervisors / Servers and the network.  The assumption is that an
   external controller transfers the routing/label information into the
   Hypervisor / Server.  The main disadvantage of that model is that the
   update process is not distributed and hence a controller needs to
   have excellent horizontal scaling abilities in order to update order
   of 100K routes/labels to order of 100K servers.

4.  BGP Prefix-SID Attribute

   In order to facilitate dense packing of Network nodes and Node labels
   to a deterministic label-range like described in Section 3.1 a new
   Protocol extension called the "BGP Prefix SID Attribute" is proposed.

   The BGP Prefix SID is a new optional, transitive BGP path attribute.
   The attribute type code for BGP Prefix SID attribute is to be
   assigned by IANA.

   The value field of the BGP Prefix SID attribute is defined here to be
   a set of elements encoded as "Type/Length/Value" (i.e., a set of
   TLVs).  Each such TLV is encoded as shown in Figure Figure 2.




Gredler                Expires September 10, 2015               [Page 5]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |       Type    |               Length          |               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
     ~                                                               ~
     |                         Value (variable)                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                           Figure 2: TLV format

   o  Type: A single octet encoding the TLV Type.  Unrecognized Types
      are preserved and propagated.  In order to compare NLRIs with
      unknown TLVs all TLVs MUST be ordered in ascending order by TLV
      Type.  If there are more TLVs of the same type, then the TLVs MUST
      be ordered in ascending order of the TLV value within the TLVs
      with the same type.  All TLVs that are not specified as mandatory
      are considered optional.

   o  Length: Two octets encoding the length of the value portion in
      octets (thus a TLV with no value portion would have a length of
      zero).  The TLV is not padded to four-octet alignment.

   o  Value: A field containing zero or more octets.

   The following TLV types are defined in this document:

                          +------+-------------+
                          | Type | Name        |
                          +------+-------------+
                          |  1   | Label Index |
                          |  2   | Label Base  |
                          |  3   | Label Range |
                          +------+-------------+

                         Table 1: Prefix SID TLVs

   Use of other TLV types is outside the scope of this document.

4.1.  Label Index TLV

   o  Type: 1

   o  Length: 4

   o  Value: Label Index

   Only one Label Index TLV per Prefix SID Attribute is allowed.



Gredler                Expires September 10, 2015               [Page 6]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


4.2.  Label Base TLV

   o  Type: 2

   o  Length: 3

   o  Value: Label Base

   One or more occurences of the Label Base TLV are allowed.  A Label
   Base TLV MUST be followed by a Label Range TLV.

4.3.  Label Range TLV

   o  Type: 3

   o  Length: 3

   o  Value: Label Range

   One or more occurences of the Label Range TLV are allowed.  A Label
   Range TLV MUST be preceeded by a Label Range TLV.

5.  Acknowledgements

   Many thanks to TBD for their detailed review and insightful comments.

6.  IANA Considerations

   This document requests a code point from the BGP Path Attributes
   registry named 'Prefix SID'

   This document requests creation of a new registry for BGP Prefix SID
   TLVs.  Value 0 is reserved.  The maximum value is 255.  The registry
   will be initialized as shown in Table 1.  Allocations within the
   registry will require documentation of the proposed use of the
   allocated value (=Specification required) and approval by the
   Designated Expert assigned by the IESG (see [RFC5226]).

7.  Security Considerations

   This document does not introduce any change in terms of BGP security.

8.  References








Gredler                Expires September 10, 2015               [Page 7]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
              Label Switching Architecture", RFC 3031, January 2001.

   [RFC3107]  Rekhter, Y. and E. Rosen, "Carrying Label Information in
              BGP-4", RFC 3107, May 2001.

   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
              May 2008.

8.2.  Informative References

   [I-D.ietf-isis-segment-routing-extensions]
              Previdi, S., Filsfils, C., Bashandy, A., Gredler, H.,
              Litkowski, S., Decraene, B., and J. Tantsura, "IS-IS
              Extensions for Segment Routing", draft-ietf-isis-segment-
              routing-extensions-03 (work in progress), October 2014.

   [I-D.ietf-mpls-seamless-mpls]
              Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz,
              M., and D. Steinberg, "Seamless MPLS Architecture", draft-
              ietf-mpls-seamless-mpls-07 (work in progress), June 2014.

   [I-D.ietf-ospf-segment-routing-extensions]
              Psenak, P., Previdi, S., Filsfils, C., Gredler, H.,
              Shakir, R., Henderickx, W., and J. Tantsura, "OSPF
              Extensions for Segment Routing", draft-ietf-ospf-segment-
              routing-extensions-04 (work in progress), February 2015.

   [I-D.ietf-rtgwg-bgp-routing-large-dc]
              Lapukhov, P., Premji, A., and J. Mitchell, "Use of BGP for
              routing in large-scale data centers", draft-ietf-rtgwg-
              bgp-routing-large-dc-01 (work in progress), February 2015.

   [I-D.kompella-mpls-larp]
              Kompella, K., Rajagopalan, B., and G. Swallow, "Label
              Distribution Using ARP", draft-kompella-mpls-larp-02 (work
              in progress), October 2014.








Gredler                Expires September 10, 2015               [Page 8]

Internet-Draft      Prefix-SID extensions for BGP-LU          March 2015


Author's Address

   Hannes Gredler
   Juniper Networks, Inc.
   1194 N. Mathilda Ave.
   Sunnyvale, CA  94089
   US

   Email: hannes@juniper.net










































Gredler                Expires September 10, 2015               [Page 9]