Network Working Group                                              X. Xu
Internet-Draft                                              China Mobile
Intended status: Standards Track                                S. Hegde
Expires: 2 February 2024                                       S. Sangli
                                                                 Juniper
                                                           1 August 2023


                  BGP Route Broker for Hyperscale SDN
                    draft-xu-idr-bgp-route-broker-01

Abstract

   This document describes an optimized BGP route reflector mechanism,
   referred to as a BGP route broker, so as to use BGP-based IP VPN as
   an overlay routing protocol for hyperscale data center network
   virtualization environments, also known as Software-Defined Network
   (SDN) environments.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 2 February 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.











Xu, et al.               Expires 2 February 2024                [Page 1]

Internet-Draft              BGP Route Broker                 August 2023


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Solution Overview . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Route Target Membership Advertisement Process . . . . . . . .   4
   4.  Proactive Route Distribution Process  . . . . . . . . . . . .   4
   5.  Passive Route Distribution Process  . . . . . . . . . . . . .   4
   6.  BGP Session Failure Notification  . . . . . . . . . . . . . .   5
   7.  BGP Withdraw of all Routes of a VPN . . . . . . . . . . . . .   5
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
   11. Normative References  . . . . . . . . . . . . . . . . . . . .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Problem Statement

   BGP/MPLS IP VPN has been successfully deployed in world-wide service
   provider networks for two decades and therefore it has been proved to
   be scalable enough in large-scale networks.  Here, the BGP/MPLS IP
   VPN means both BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN
   [RFC4659] . In addition, BGP/MPLS IP VPN-based data center network
   virtualization approaches described in [RFC7814], especially in the
   virtual PE model described in [I-D.ietf-bess-virtual-pe] have been
   widely deployed in small to medium-sized data centers for network
   virtualization purpose, also known as Software Defined Network (SDN).
   Examples include but not limited to OpenContrail.

   When it comes to hyperscale cloud data centers typically housing tens
   of thousands of servers which in turn are virtualized as Virtual
   Machines (VMs) or containers, it usually means there would be at
   least tens of thousands of virtual PEs, millions of VPNs and tens of
   millions of VPN routes from the network virtualization perspective
   provided the virtual PE model as mentioned above (a.k.a., a host-
   based network virtualization model) is used.  That means a
   significant challenge on both the BGP session capacity and the VPN
   routing table capacity of any given BGP router.




Xu, et al.               Expires 2 February 2024                [Page 2]

Internet-Draft              BGP Route Broker                 August 2023


   It’s no doubt that the route reflection mechanism should be
   considered in order to address the BGP scaling issues as mentioned
   above.  Assume a typical one-level route reflector architecture is
   used, it's straightforward to divide all the VPNs supported by a data
   center into multiple route reflectors with each route reflector being
   preconfigured with a block of route targets associated with partial
   VPNs.  In other words, there is no need to have any one route
   reflector maintain all the VPN routes for all the VPNs supported by
   the data center.  For redundancy, more than one route reflector may
   be preconfigured with the same block of route targets.

   Provided each virtual PE had been attached with at least one VPN
   corresponding to a given route reflector, that particular route
   reflector would have to establish BGP sessions with all virtual PEs,
   it would become a huge BGP session pressure on route reflectors.Now
   assume that another level (bottom-level) of route reflectors is
   introduced between the existing level (top-level) of router
   reflectors and the virtual PEs.  Each top-level route reflectors
   would establish BGP sessions with all bottom-level route reflectors
   rather than all virtual PE routers.  In addition, bottom-level just
   need to establish BGP sessions with a subset of all virtual PEs
   respectively.  As a result, the scaling issue of the BGP session
   capacity is solved through the above partition mechanism.  However,
   if the collection of VPNs attached to those route reflector clients
   (i.e., virtual PEs) belonging to a given bottom-level route reflector
   covers the all VPNs supported by the data center, that particular
   bottom-level route reflector would have to hold all the VPNs and all
   the VPN routes.  It means a huge challenge on that particular route
   reflector.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Solution Overview

   Assume the number of BGP sessions to be established on each bottom-
   level route reflectors can not be reduced further due to some reasons
   (e.g., it becomes unacceptable to manage too many route-reflectors),
   the number of VPN routes to be maintained on each bottom-level route
   reflectors should be alleviated by some means.






Xu, et al.               Expires 2 February 2024                [Page 3]

Internet-Draft              BGP Route Broker                 August 2023


   By learning from the message queue mechanisms (e.g., RabbitMQ and
   RocketMQ), those bottom-level route reflectors, referred to as route
   brokers in the following text, work as follows: they just need to
   maintain the route target membership information of their BGP peers
   and reflect VPN routes on demands without the requirement of
   maintaining VPN routes permanently.

3.  Route Target Membership Advertisement Process

   Top-level route reflectors, referred to as route servers, advertise
   route target membership information according to the preconfigured
   block of Route Targets.  As such, route brokers know the VPNs
   associated with each of them.  The route target membership
   information received form route servers SHOULD NOT be reflected by
   route brokers to any other iBGP peers further.

   Virtual PEs, referred to as route broker clients, advertise route
   target membership information according to the block of Route Targets
   which are dynamically configured.  The route target membership
   information received from route broker clients would be deemed by
   route brokers as an implicit route request for all the VPN routes for
   the VPNs associated to the corresponding route targets, and only need
   to be reflected towards the corresponding route servers which are
   associated with the VPNs associated with the advertised route
   targets.

4.  Proactive Route Distribution Process

   Upon receiving a route update message from a route server which
   contains VPN routes for a given VPN, route brokers would reflect the
   received routes to those of its route broker clients which are
   associated with that VPN.  Upon receiving a route update message from
   a route broker client which contains VPN routes for a given VPN,
   route brokers would reflect the received routes to the other iBGP
   peers (including route servers and route broker clietns) which are
   associated with that VPN.

   Once the route reflection is finished, the above routes would be
   deleted.

5.  Passive Route Distribution Process

   Upon receiving an implicit route request for all the VPN routes for
   one or more VPNs (via the route target membership information
   advertisement) from a route broker client, route brokers SHOULD
   reflect that request to the corresponding route servers which are
   associated with the VPNs pertaining to the advertised route targets
   respectively.



Xu, et al.               Expires 2 February 2024                [Page 4]

Internet-Draft              BGP Route Broker                 August 2023


   Upon receiving the implicit route request reflected from the BGP
   broker, route servers SHOULD respond with the corresponding VPN
   routes to that broker which in turn reflects the received VPN routes
   to the route broker client.  Once route reflection is finished, the
   received VPN routes would be deleted.

   To alleviate the route request processing pressure on route servers,
   route brokers COULD optionally cache the VPN routes returned from
   route servers as a response to an implicit route request for a period
   of time which is configurable.  The cached routes could be directly
   used when responding to the forthcoming route request for those
   routes.

6.  BGP Session Failure Notification

   When a route broker loses the BGP connection with a given route
   broker client, it SHOULD send a Notification message towards all
   route servers to indicate the failure of the BGP connection with that
   route broker client.

   Upon receiving the above Notification message, route servers would
   withdraw all VPN routes with the BGP next-hop address being the
   failed route broker client.

   The BGP router ID of the failed route broker client could be carried
   in a TLV, which in turn is carried in a Notification message with
   error code of TBD.

7.  BGP Withdraw of all Routes of a VPN

   When all router servers which are configured with the same route
   target list are down, route brokers SHOULD notify their router broker
   clients to withdraw all the VPN routes for the VPNs assoicated with
   any route target within the above route target list.

8.  IANA Considerations

   TBD

9.  Security Considerations

   TBD

10.  Acknowledgements

   The authors would like to thank Jie Dong for the discussion and
   review of this document.




Xu, et al.               Expires 2 February 2024                [Page 5]

Internet-Draft              BGP Route Broker                 August 2023


11.  Normative References

   [I-D.ietf-bess-virtual-pe]
              Fang, L., Fernando, R., Napierala, M., Bitar, N. N., and
              B. Rijsman, "BGP/MPLS VPN Virtual PE", Work in Progress,
              Internet-Draft, draft-ietf-bess-virtual-pe-00, 12 November
              2014, <https://datatracker.ietf.org/doc/html/draft-ietf-
              bess-virtual-pe-00>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
              2006, <https://www.rfc-editor.org/info/rfc4364>.

   [RFC4659]  De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur,
              "BGP-MPLS IP Virtual Private Network (VPN) Extension for
              IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, September 2006,
              <https://www.rfc-editor.org/info/rfc4659>.

   [RFC7814]  Xu, X., Jacquenet, C., Raszuk, R., Boyes, T., and B. Fee,
              "Virtual Subnet: A BGP/MPLS IP VPN-Based Subnet Extension
              Solution", RFC 7814, DOI 10.17487/RFC7814, March 2016,
              <https://www.rfc-editor.org/info/rfc7814>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Authors' Addresses

   Xiaohu Xu
   China Mobile
   Email: xuxiaohu_ietf@hotmail.com


   Shraddha Hegde
   Juniper
   Email: shraddha@juniper.net


   Srihari Sangli
   Juniper
   Email: ssangli@juniper.net




Xu, et al.               Expires 2 February 2024                [Page 6]