Internet DRAFT - draft-yu-nvo3-geneve-pkt-reordering

draft-yu-nvo3-geneve-pkt-reordering



 



INTERNET-DRAFT                                                     Y. Yu
Intended Status: Standards Track                     Huawei Technologies
Expires: Mar 5, 2019                                             J. Wang
                                                           China Telecom
                                                             Sep 1, 2018

              Packet Reordering in Geneve Overlay Network
               draft-yu-nvo3-geneve-pkt-reordering-00

Abstract

   Congestion is the killer of low latency and high throughput.Network
   congestion occurs on the interconnection links of a data center due
   to poor traffic distribution. Load balancing technologies are used to
   solve network congestion. Packet spraying is a kind of load balancing
   technology with finer granularity. During this situation, the packets
   may arrive at the destination out of order.  This document describes
   a reordering protocol in the Geneve encapsulation network[1] using a
   newly defined Geneve Option field.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html








 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 1]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


Copyright and License Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3  Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . .  3
   4  Problem Statements & Requirements . . . . . . . . . . . . . . .  3
   5  Packet Reordering on Geneve . . . . . . . . . . . . . . . . . .  4
     5.1 Packet Reordering Format . . . . . . . . . . . . . . . . . .  4
     5.2 Packet Reordering Capability Discovery . . . . . . . . . . .  6
   6  Security Considerations . . . . . . . . . . . . . . . . . . . .  8
   7  IANA Considerations . . . . . . . . . . . . . . . . . . . . . .  8
   8  References  . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .  9




















 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 2]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


1  Introduction

   In many current data centers, network utilization is not has high as
   it could be. For example, in some scenarios, the average network
   utilization is about 20% and the peak utilization is about 45%[2].
   With the improvement of end systems (or endpoints), the  deployment
   of multi-services and high-volume traffic services (such as streaming
   media, big data processing applications and user-oriented large-scale
   web applications, etc.), more and more network performance problems
   appear. These problems are created by traffic bursts and traffic
   routing collisions. The imbalance of traffic on the network becomes
   more and more prominent which leads to underutilized network
   bandwidth and decreased overall performance of network applications.

   In order to fully utilize the available network bandwidth, traffic
   flows into the network are dispersed across multiple paths to achieve
   load balancing. The finer the granularity of the load balancing, the
   higher the utilization of available network bandwidth. Current flow-
   based and flowlet-based[3] approaches are more coarse grain than
   packet-based load balancing. During the packet spraying situation,
   the packets may arrive at the destination out of order because the
   difference latency of links. This document describes how to extend
   the Geneve header to support reordering for packet-based load
   balancing, called reordering in the Geneve encapsulation network.

2  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

3  Abbreviations

   GENEVE - Generic Network Virtualization Encapsulation

   ECMP - Equal-cost multi-path routing 

   SDN - Software Defined Network

   GFP - Geneve Forwarding Policy

4  Problem Statements & Requirements

   The current general network topology in the data center is a multi-
   rooted tree architecture, such as the typical CLOS network. This kind
   of network has multiple paths and an equal division of bandwidth
   across those paths which provides good scalability and flexibility
   depending on how the multiple paths are utilized. In order to fully
 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 3]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


   utilize the network bandwidth, traffic flows into the network are
   dispersed on the multiple paths to achieve load balancing. Currently,
   the granularity of load balancing can be seen in the following
   approaches: flow-based load balancing (such as ECMP), flowlet-based
   load balancing (such as CONGA[2]) and packet-based load balancing
   (such as Packet Spraying). The finer the granularity of load
   balancing, the more effective the load balancing is and the higher
   the utilization of network bandwidth can be.

   The effect of packet-based load balancing is the best one among the
   three because the corresponding granularity is the smallest. However,
   the consequence is that packets belonging to the same flow will be
   allocated to different paths. When the forwarding delays of paths are
   different, it is possible that packets may arrive at the receiver
   out-of-order. To detect out-of-order packets and restore the correct
   order, a sequence number is needed in the packets.

5  Packet Reordering on Geneve
5.1 Packet Reordering Format
   The Geneve Header and the Geneve option have the following format[1]:
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Ver|  Opt Len  |O|C|   Rsvd.   |          Protocol Type        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |              Virtual Network Identifier (VNI) |    Reserved   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Variable Length Options                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                             Geneve Header


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Option Class        |      Type     |R|R|R|  Length |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Variable Option Data                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                             Geneve Option

   Option Class = To be assigned by IANA (TBA).
   Type = TBA.
   Length = 2 (8 byte) 


   The proposed Packet Reordering option for Geneve will have the
   following format:
 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 4]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Option Class = GFP          |      Type     |R|R|R| Length  | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                        Flow Group ID                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
   |                      Sequencing Number                        | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Packet Reordering Format over Geneve

   Option Class = Geneve Forwarding Policy(suggested), to be assigned by
   IANA (TBA).
   Type = TBA.
   Length = 2 (8 byte) 

   Flow Group ID: will be described in 5.1.1

   Sequencing Number: will be described in 5.1.2

5.1.1 Flow Group ID Field (4 Bytes)

   The Flow Group ID field is a four byte field. The Flow Group ID
   identifies a group of flows within the same reorder sequence space
   between a pair of src/dest nodes. The Flow Group ID may correspond to
   an individual flow, some subset of flows, or even all flows between
   the src/dest pair. How the flow corresponds to the Flow Group ID is
   not defined by this draft. The same Flow Group ID can be used by
   different src/dest pairs (i.e. a Flow Group ID is only unique within
   the context of a src/dest pair). A Flow Group is uniquely identified
   by the 3 tuple that includes src IP, dest IP and Flow Group ID. The
   source node allocates the sequence number according to the order
   packets are sent for flows of the same Flow Group. The destination
   will reorder the received packets of a Flow Group according to the
   received sequence number.

5.1.2 Sequence Number Field    

   The Sequence Number field is a four byte field that closely follows
   the definition of the Sequence Number in RFC 2890[4]. The sequence
   number value ranges from 0 to (2**32)-1. The first datagram is sent
   with a sequence number of 0.  The sequence number is thus a
   monotonically increasing counter represented modulo 2**32.  The
   receiver maintains the sequence number value of the last successfully
   decapsulated packet. This value should be initialized to (2**32)-1.

   A packet is considered an out-of-sequence packet if the sequence
   number of the received packet is less than or equal to the sequence
 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 5]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


   number of last successfully decapsulated packet. The sequence number
   of a received message is considered less than or equal to the last
   successfully received sequence number if its value lies in the range
   of the last received sequence number and the preceding 2**31-1
   values, inclusive.

   If the received packet is an in-sequence packet, it is successfully
   decapsulated. An in-sequence packet is one with a sequence number
   exactly 1 greater than (modulo 2**32) the last successfully
   decapsulated packet. If the received packet is neither an in-sequence
   nor an out-of-sequence packet it indicates a sequence number gap. The
   receiver may perform a small amount of buffering in an attempt to
   recover the original sequence of transmitted packets. In this case,
   the packet may be placed in a buffer sorted by sequence number.  If
   an in-sequence packet is received and successfully decapsulated, the
   receiver should consult the head of this buffer to see if the next
   in-sequence packet has already been received. If so, the receiver
   should decapsulate it as well as the following in-sequence packets  
   that may be present in the buffer. The "last successfully
   decapsulated sequence number" should then be set to the last packet
   that was decapsulated from the buffer.

   Under no circumstances should a packet wait more that
   OUTOFORDER_TIMER microseconds in the buffer. If a packet has been
   waiting that long, the receiver MUST immediately traverse the buffer
   in sorted order, decapsulating packets (and ignoring any sequence
   number gaps) until there are no more packets in the buffer that have
   been waiting longer than OUTOFORDER_TIMER milliseconds. The "last
   successfully decapsulated sequence number" should then be set to the
   last packet so decapsulated.

   The receiver may place a limit on the number of packets in any per-
   flow group buffer (Packets with the same Flow Group ID Field value
   belong to a flow group). If a packet arrives that would cause the
   receiver to place more than MAX_PERFLOW_BUFFER packets into a given
   buffer, then the packet at the head of the buffer is immediately
   decapsulated regardless of its sequence number and the "last
   successfully decapsulated sequence   number" is set to its sequence
   number. The newly arrived packet may then be placed in the buffer.

   The received packets of flows from the same Flow Group are in the
   same reorder sequence space. The source ensures to allocate the
   sequence number according to the sequence of sent packets. If the
   sequence number wraps, the source will allocate from 0 again.


5.2 Packet Reordering Capability Discovery

 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 6]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


   The reorder function on the destination needs certain resources. For
   example, there is a reorder queue corresponding to each Group ID(Flow
   Group ID plus the Source IP address). For some resource-intensive
   chips such as switch chips, the amount of queues are limited.
   Therefore, it is important to not exceed the ability of the
   destination when assigning the Group ID at the source. This requires
   that the source understands the ability of the destination. There are
   several solutions, such as static configuration, or direct signaling
   between the two ends. In the following situations, the capability
   notifications need to be sent to the peer:
   1. When the source communicates with the destination for the first
   time.
   2. When receiving the peer packet for the first time
   3. When receiving the capability notification from the source
   4. When the Group ID of peer exceed the local capability

   In the above cases, the destination needs to notify the capability
   (reorder queues assigned to the peer) to the source. When receiving
   the capability notification from the destination, the source needs to
   tune the allocation mechanism of Group ID according to the capability
   of destination to ensure the number of Group IDs does not exceed the
   number of reordering queues allocated to the source. 

   When the number of Group IDs exceed the local capability, the
   following 2 actions can be taken. Which option is selected is not
   covered in this draft.
   1.Discard the Geneve packet for the Group ID that exceeds the local
   capability

   2.Remove the Geneve encapsulation, without performing reordering and
   pass the packet to higher layer protocol. For higher layer protocols
   that can tolerate a certain degree of out-of-order packets (such as
   TCP), the message may be processed correctly.

   When the Group ID exceeds the local capability, the destination sends
   a notification of the reordering capability to the source. To prevent
   sending the capability notification too frequently, a notification
   suppression capability is needed. When the destination wants to send
   a notification of the capability of the source, it enters a
   suppression cycle. The destination will not send the capability
   notification to the source until the suppression cycle ends. The
   suppression period is longer than the RTT between 2 nodes. 

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Ver|  Opt Len  |O|C|    Rsvd.  |          Protocol Type        |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 7]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


   |        Virtual Network Identifier (VNI)       |    Reserved   |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Option Class = GFP           | Type=Capacity |R|R|R| Length  |  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
   |                           MAX GROUP ID                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
                 Capability notification message format


   Length=1 (4 byte)

   MAX GROUP ID is a four byte field. MAX Group ID indicate the max
   Group ID assigned to the destination. The Group ID allocated by the
   source must be limited to 0 ~ (MAX Group ID - 1).




6  Security Considerations

   This document describes Geneve option which introduce Flow Group ID
   and Sequence Number to reorder packets. Within the Sequence Number
   Field, it is possible to inject packets with an arbitrary Sequence
   Number and launch a Denial of Service attack. This is a general
   security issue which is defined in Geneve security requirements[5].

   In order to protect against such attacks, IPSec could be used to
   protect the Geneve header and the tunneled payload. Any common Geneve
   security mechanism also applies to this draft.

7  IANA Considerations

   IANA is requested to allocate a Geneve "option class" number for
   GFP(Geneve Forwarding Policy):

              +---------------+-------------+---------------+
              | Option Class  | Description | Reference     |
              +---------------+-------------+---------------+
              | x             | GFP_ID      | This document |
              +---------------+-------------+---------------+


8  References

   [1] J. Gross, Ed., I. Ganga, Ed., T. Sridhar, Ed., "Generic Network
   Virtualization Encapsulation", [I-D.ietf-nvo3-geneve]

   [2] Jiaxin Cao, et al, "Per-packet Load-balanced, Low-Latency Routing
 


<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 8]

INTERNET DRAFT   <Reordering in Geneve Overlay Network>    <Sep 1, 2018>


   for Clos-based Data Center Networks", CoNEXT'13

   [3] Mohammad Alizadeh, et al, "CONGA: Distributed Congestion-Aware
   Load Balancing for Datacenters", Sigcomm'14

   [4] G. Dommety, "Key and Sequence Number Extensions to GRE", RFC
   2890, September 2000

   [5] D. Migault, S. Boutros, D. Wing, S. Krishnan,"Geneve Protocol
   Security Requirement", [I-D. draft-mglt-nvo3-geneve-security-
   requirements-03] 



Authors' Addresses

   Yolanda Yu
   Huawei Technologies Co., Ltd.
   Email: yolanda.yu@huawei.com

   Jianglong Wang
   China Telecom
   Email: wangjl1.bri@chinatelecom.cn




























<Yu, et al.>             Expires <Mar 5, 2019>                  [Page 9]