Internet DRAFT - draft-eromenko-ipff-tcp64

draft-eromenko-ipff-tcp64



INTERNET-DRAFT
"TCP 64-bit extension: Modern Variation",
Alexey Eromenko, 2016-09-29, 
<draft-eromenko-ipff-tcp64-04.txt>
expiration date: 2017-03-29



Intended status: Standards Track
                                                              A.Eromenko
                                                          September 2016




                  TCP 64-bit extension: "Modern Variation"
                ===========================================
                    for Internet Protocol - Five Fields



Abstract

   This document attempts to modernize TCP protocol for new reality,
   faster bandwidth, encryption-optimization and optional checksums,
   which is required for Identity-Locator Network Protocol (ILNP) 
   compatibility.
   This extension is backwards compatible with the original TCP 
   specification during session establishment, but not compatible 
   during the rest of the session nor with deployed middleboxes.


Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Table of Contents


   Introduction
   1. TCP Header: "Classic variation"
   2. TCP Header: "Modern variation" a.k.a TCP.64
   2.1. TCP Header: "Modern variation without CRC"
   3. Initiating a TCP.64 Session "Modern Variation"
   4. TCP.64 "Modern Variation" establishment options
   4.1. TCP.64 Session option: Modern Maximum Segment Size
   4.2. TCP.64 Session option: Modern window scaling
   4.3. TCP.64 Session option: Checksum ignored
   Authors' Contacts


Introduction

   TCP in IP-FF comes in several variations, of flavors.
   The questions is:
   Our operating systems and processors are 64-bit.
   Why not make TCP 64-bit also ?
   Well, I decided to define what TCP 64-bit should look like.

   The session begins with the good-old, time-tested 
   "Classic variation", which looks familiar. 
   Just Port fields have moved to the "IP" layer.
   ...and only during SYN/ACK, it MAY be moved to a different variation.


1. TCP Header: "Classic variation"

    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  4|                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  8|                    Acknowledgment Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 12|  Data |     |N|C|E|U|A|P|R|S|F|                               |
   | Offset|     |S|W|C|R|C|S|S|Y|I|            Window             |
   |       |     | |R|E|G|K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 16|           Checksum            |         Urgent Pointer        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(bytes)

   Few differences between original TCP and TCP/IPFF "classic":

   a. Ports have moved to IPFF layer.

   b. Checksum needs to be computed using the new pseudo-header,
      according to [IPFF] specification.
      But using the old checksum algorithm (not CRC); This is to allow
      TCP/IPFF to function fairly fast on old processors.

   c. 999 field limit check:

   During session establishment, both <SYN> and <ACK> phases,
   TCP MUST check for 999 field limit for IP-FF.
   If any field has higher value (between 1000 and 1023), connection 
   MUST be dropped, either silently or rejected via TCP Reset flag.
   This check MUST be performed against both source and destination
   addresses.

   At a minimum, IPFF implementation MUST support "classic variation".
   Other TCP variations are needed for full stack support.
   For Hardware with links over 40 Gbit/s, supporting TCP.64 is
   mandatory.


2. TCP Header: "Modern variation" a.k.a TCP.64

    0                   1                   2                   3  
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  4|  Data Offset  |                    Window                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  8|             |N|C|E| |A|P|R|S|F|                               |
   |             |S|W|C| |C|S|S|Y|I|                               |
   |             | |R|E| |K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 12|                                                               |
   +                        Sequence Number                        +
 16|                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 20|                                                               |
   +                    Acknowledgment Number                      +
 24|                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 28|                                                               |
   +                       CRC64c Checksum                         +
 32|                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(bytes)

   64-bit what ?

   SYN/ACK fields, CRC64 checksum, and 64-bit logical window size.
   Physical window size is 24-bits.

   Design note: Bloated for a good reason.

   I realize the downside of making TCP bloated by a whopping 
   extra 16 bytes, but I also realize this is a necessary evil at speeds
   over 100 Gigabits-per-Second.
   If you're on a slow link, just don't advertise that you're TCP 64-bit
   capable, and stay on the "classic variation".

   64-bit Checksums:

   16-bit checksums of "classic variation" may fail badly.
   Today, only Data-Link layer checksum saves Internet from complete 
   breakdown, as those checksums are fairly strong 32-bit CRCs.

   But strong CRC64c checksums is an adequate protection for future 
   huge amounts of unencrypted data.
   Going encryption of-course renders checksums useless.


   64-bit Sequence numbers and acknowledgements:

   The problem with 32-bit SYN/ACK is TCP Reliability

   Quote from RFC-7323:

   "An especially serious kind of error may result from an accidental
   reuse of TCP sequence numbers in data segments.  TCP reliability
   depends upon the existence of a bound on the lifetime of a segment:
   the "Maximum Segment Lifetime" or MSL.

   Duplication of sequence numbers might happen in either of two ways:

   (1)  Sequence number wrap-around on the current connection

        A TCP sequence number contains 32 bits.  At a high enough
        transfer rate of large volumes of data (at least 4 GiB in the
        same session), the 32-bit sequence space may be "wrapped"
        (cycled) within the time that a segment is delayed in queues.

   (2)  Earlier incarnation of the connection

        Suppose that a connection terminates, either by a proper close
        sequence or due to a host crash, and the same connection (i.e.,
        using the same pair of port numbers) is immediately reopened.  A
        delayed segment from the terminated connection could fall within
        the current window for the new incarnation and be accepted as
        valid.

   Duplicates from earlier incarnations, case (2), are avoided by
   enforcing the current fixed MSL of the TCP specification, as
   explained in Section 5.8 and Appendix B.  In addition, the
   randomizing of ephemeral ports can also help to probabilistically
   reduce the chances of duplicates from earlier connections.  However,
   case (1), avoiding the reuse of sequence numbers within the same
   connection, requires an upper bound on MSL that depends upon the
   transfer rate, and at high enough rates, a dedicated mechanism is
   required.""

   
   On a gigabit link, Sequence numbers are rotated every 17 seconds.
   On a 100-gigabit link, this is well under a second.
   TCP originally was never designed for such speeds.

   This is dangerous, because packets from older rotation might get 
   stuck in queue, then released by a router, get through and corrupt
   user data if sent to destination, or cause a TCP reset, if sent as 
   an ack to the sender. 
   And those old packets do have correct sequence number and correct 
   checksum.

   PAWS, a system designed to prevent such issues by using timers, may
   fail badly and produce bad errors with 32-bit Sequence number,
   due to inaccurate timing issues.

   Data Offset: 8 bits; not including the fixed header: 
   the first 32 or 28 bytes, for "Modern" and "Modern-no-CRC"
   variations, respectively.
   This allows far more options in to be sent in TCP.64 in fact, 
   1020 bytes in options, while TCPv4 provided only 40 bytes of options.

   Compatibility Note: 64-bit TCP breaks compatibility with existing
     "middleboxes".

   64-bit Initial sequence number:
   Must be copied from the lower 32-bits to the upper 32-bits,
   if not specified during SYN.

   The actual switching to "Modern variation" happens from ACK 
   packet (3rd handshake), assuming the "ACK" advertises such 
   capability.


2.1. TCP Header: "Modern variation without CRC" or TCP.64-NO-CRC

    0                   1                   2                   3  
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  4|  Data Offset  |                    Window                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  8|             |N|C|E| |A|P|R|S|F|                               |
   |             |S|W|C| |C|S|S|Y|I|                               |
   |             | |R|E| |K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 12|                                                               |
   +                        Sequence Number                        +
 16|                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 20|                                                               |
   +                    Acknowledgment Number                      +
 24|                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(bytes)

   Same as above, but without the CRC64c checksum. This is useful for
   encryption protocols, such as SSL/TLS, that provide their own 
   checksum, protection or authentication mechanisms.

   Initially the system switches to "Modern variation", on SYN/ACK, 
   and with CRC64 field, but that field gets eliminated after 65535 
   bytes of data, which is where switching to 
   "Modern variation without CRC" occurs.

   (i.e. packets that begin with relative SYN over 65535, are expected
   by the receiver to have no CRC64 field)
   In first 64 KiB, CRC is required to negotiate encryption parameters.



3. Initiating a TCP.64 Session "Modern Variation"


   This is the same signaling as for initiating a normal TCP connection,
   but the SYN, SYN/ACK, and ACK packets also carry the 64BIT_CAPABLE
   option.

      Client                                  Server
      ------                                  ------
      64BIT_CAPABLE option     ->
      [SYN flag]
      TCP SYN phase
                            <-                64BIT_CAPABLE option
                                              [ACK flag]
                                              TCP ACK phase
      64-bit TCP               ->
      [SYN flag]
      TCP SYN64 phase
      
                            <-                64-bit TCP
                                              [SYN+ACK flags]
                                              TCP SYN-ACK64 phase

   Design Note:

   3-way handshake just became 4-way handshake. Why?
   In order to transfer extra options.
   Original TCP protocol allows only 40 bytes of options to be sent,
   which is nowhere near enough for future requirements,
   and session initialization. 
   TCP.64 allows to send 1020 bytes of options, over 25x times more.



4. TCP.64 "Modern Variation" establishment options

   TCP "Modern Variation" Option:

             +---------+---------+---------+---------+
             | Kind=31 | Length=7|Variation|Upper ISN|
             |         |         |         | 32-bits |
             +---------+---------+---------+---------+
                  1         1         1         4

     Kind: 31 (To be determined by IANA)

   Length:

     3 = For flags (such as "Checksum ignored")

     7 = For setting variation + upper 32-bits of initial sequence 
         number.

   Variations or codes:

     0 = Checksum ignored (useful for ILNP); 
         Takes effect from 1st packet. (initial SYN)

     1 = Modern Variation (TCP.64-bit capable)

     2 = Modern Variation, without CRC field; (useful for encryption)
         Takes effect after 64 KiB of data.

   Those options must be sent during the initial <SYN> phase.
   When setting "Modern variation", a node must also set the upper
     32-bits of initial sequence number.



4.1. TCP.64 Session option: Modern Maximum Segment Size

        Kind: 2   Length: 6 bytes   MSS: 32-bit.

             +---------+---------+---------+
             | Kind=2  |Length=6 |   MSS   |
             +---------+---------+---------+
                  1         1         4

   Modern Maximum Segment Size Option Data:  32 bits

   If this option is present, then it communicates the maximum
   receive segment size at the TCP which sends this segment.
   This field must only be sent in the initial connection request
   (i.e., in segments in <SYN64> phase).  If this
   option is not used, any segment size is allowed.

   This option, if present, overrides the classic 16-bit MSS.
   This feature exists to support "Jumbograms". (packets > 64KiB)


4.2. TCP.64 Session option: Modern window scaling


        Kind: 3   Length: 3 bytes   shift.cnt: valid value up to 38

             +---------+---------+---------+
             | Kind=3  |Length=3 |shift.cnt|
             +---------+---------+---------+
                  1         1         1

   This option is an offer, not a promise; both sides must send
   Modern Window Scale options in their SYN64 segments to enable window
   scaling in either direction.  If window scaling is enabled,
   then the TCP that sent this option will right-shift its true
   receive-window values by 'shift.cnt' bits for transmission in
   SEG.WND.  The value 'shift.cnt' may be zero (offering to scale,
   while applying a scale factor of 1 to the receive window).

   This option may be sent in an initial <SYN64> segment (i.e., a
   segment with the SYN bit on and the ACK bit off).  It may also
   be sent in a <SYN,ACK64> segment, but only if a Window Scale op-
   tion was received in the initial <SYN64> segment.  A Window Scale
   option in a segment without a SYN bit should be ignored.

   The Window field in a SYN64 (i.e., a <SYN64> or <SYN,ACK64>) segment
   itself is never scaled.

   Receiving this option during <SYN64> phase by both sides overrides 
   the classic "Window Scale Option" set during normal <SYN> phase,
   if any.
   Actually moving to "Modern variation" by itself invalidates this
   legacy option.

      *    All windows are treated as 64-bit quantities for storage in
           the connection control block and for local calculations.
           This includes the send-window (SND.WND) and the receive-
           window (RCV.WND) values, as well as the congestion window.

      *    Valid value up to 38. Anything higher gets this option 
           ignored.

   Design note:

   "Classic variation" TCP with it's 30-bit window (16-bit window+
   14-bit window scaling) will work till about 100 Gbit links.
   At 100 GBit links + 100 ms latency, you will have 10 GBit
   of unacknowledged data, mid-air.
   Over 100 gigs TCP performance will start to degrade.

   But today (2015) already 1 TBit experimental transmitters exist,
   so planning for the future, we need *much* larger window or scaling.
   This is why 38-bit Window Scaling was invented for TCP.64-bit,
   along with 24-bit Window.


4.3. TCP.64 Session option: Checksum ignored

   This option lets the receiver to ignore TCP-supplied checksum.
   Affects both classic 16-bit checksum as well as CRC.
   Affects TCP session from the initial SYN packet.

   This allows for Identity-Locator Network Protocol (ILNP), as 
   defined in RFC 6740-48, to function with TCP/IP networks.
   In this case ILNP or other protocol should provide their own
   checksums or error correction for both TCP and IP headers and data.

   When used together with TCP.64-bit "modern variation",
   the system always resets to "64-bit-NO-CRC" from the first packet.
   (i.e. SYN64 phase)



Acknowledgements:

   Influenced by the hard work of Robert Ullmann
   "TP/IX: The Next Internet"
   [RFC-1475] 
   and: "Identifier-Locator Network Protocol (ILNP)",
   [RFC-6740] written by "Randall J. Atkinson" and "SN Bhatti".
   and:
   [RFC-2675]; "IPv6 Jumbograms" by David A. Borman, 
   Stephen E. Deering, and Robert M. Hinden
   and partially derived from:
   [RFC-1323] and [RFC-7323]; "TCP Extensions for High Performance"
   by "D. Borman", "B. Braden", "V. Jacobson", "R. Scheffenegger, Ed."

   And big thanks to DARPA for the original specification of 
   Transmission Control Protocol, as defined in [RFC-793] !
   One of the greatest inventions of the Internet !


Authors' Contacts

   Alexey Eromenko
   Israel

   Skype: Fenix_NBK_
   EMail: al4321@gmail.com
   Facebook: https://www.facebook.com/technologov


INTERNET-DRAFT
Alexey
expiration date: 2017-03-29