INTERNET-DRAFT "TCP 64-bit extension: Modern Variation", Alexey Eromenko, 2016-03-02, expiration date: 2016-09-02 Intended status: Standards Track A.Eromenko March 2016 TCP 64-bit extension: "Modern Variation" =========================================== for Internet Protocol - Five Fields Abstract This document attempts to modernize TCP protocol for new reality, faster bandwidth, encryption-optimization and optional checksums, which is required for Identity-Locator Network Protocol (ILNP) compatibility. This extension is backwards compatible with the original TCP specification during session establishment, but not compatible during the rest of the session nor with deployed middleboxes. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents Introduction 1. TCP Header: "Classic variation" 2. TCP Header: "Modern variation" a.k.a TCP.64 2.1. TCP Header: "Modern variation without CRC" 3. Initiating a TCP.64 Session "Modern Variation" 4. TCP.64 "Modern Variation" establishment options 4.1. TCP.64 Session option: Modern Maximum Segment Size 4.2. TCP.64 Session option: Modern window scaling 4.3. TCP.64 Session option: Checksum ignored Authors' Contacts Introduction TCP in IP-FF comes in several variations, of flavors. The questions is: Our operating systems and processors are 64-bit. Why not make TCP 64-bit also ? Well, I decided to define what TCP 64-bit should look like. The session begins with the good-old, time-tested "Classic variation", which looks familiar. Just Port fields have moved to the "IP" layer. ...and only during SYN/ACK, it MAY be moved to a different variation. 1. TCP Header: "Classic variation" 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4| Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8| Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 12| Data | |N|C|E|U|A|P|R|S|F| | | Offset| |S|W|C|R|C|S|S|Y|I| Window | | | | |R|E|G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 16| Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (bytes) Few differences between original TCP and TCP/IPFF "classic": a. Ports have moved to IPFF layer. b. Checksum needs to be computed using the new pseudo-header, according to [IPFF] specification. But using the old checksum algorithm (not CRC); This is to allow TCP/IPFF to function fairly fast on old processors. c. 999 field limit check: During session establishment, both and phases, TCP MUST check for 999 field limit for IP-FF. If any field has higher value (between 1000 and 1023), connection MUST be dropped, either silently or rejected via TCP Reset flag. This check MUST be performed against both source and destination addresses. At a minimum, IPFF implementation MUST support "classic variation". Other TCP variations are needed for full stack support. For Hardware with links over 40 Gbit/s, supporting TCP.64 is mandatory. 2. TCP Header: "Modern variation" a.k.a TCP.64 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4| Data Offset | Window | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8| |N|C|E| |A|P|R|S|F| | | |S|W|C| |C|S|S|Y|I| | | | |R|E| |K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 12| | + Sequence Number + 16| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 20| | + Acknowledgment Number + 24| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 28| | + CRC64c Checksum + 32| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (bytes) 64-bit what ? SYN/ACK fields, CRC64 checksum, and 64-bit logical window size. Physical window size is 24-bits. Design note: Bloated for a good reason. I realize the downside of making TCP bloated by a whopping extra 16 bytes, but I also realize this is a necessary evil at speeds over 100 Gigabits-per-Second. If you're on a slow link, just don't advertise that you're TCP 64-bit capable, and stay on the "classic variation". 64-bit Checksums: 16-bit checksums of "classic variation" may fail badly. Today, only Data-Link layer checksum saves Internet from complete breakdown, as those checksums are fairly strong 32-bit CRCs. But strong CRC64c checksums is an adequate protection for future huge amounts of unencrypted data. Going encryption of-course renders checksums useless. 64-bit Sequence numbers and acknowledgements: The problem with 32-bit SYN/ACK is TCP Reliability Quote from RFC-7323: "An especially serious kind of error may result from an accidental reuse of TCP sequence numbers in data segments. TCP reliability depends upon the existence of a bound on the lifetime of a segment: the "Maximum Segment Lifetime" or MSL. Duplication of sequence numbers might happen in either of two ways: (1) Sequence number wrap-around on the current connection A TCP sequence number contains 32 bits. At a high enough transfer rate of large volumes of data (at least 4 GiB in the same session), the 32-bit sequence space may be "wrapped" (cycled) within the time that a segment is delayed in queues. (2) Earlier incarnation of the connection Suppose that a connection terminates, either by a proper close sequence or due to a host crash, and the same connection (i.e., using the same pair of port numbers) is immediately reopened. A delayed segment from the terminated connection could fall within the current window for the new incarnation and be accepted as valid. Duplicates from earlier incarnations, case (2), are avoided by enforcing the current fixed MSL of the TCP specification, as explained in Section 5.8 and Appendix B. In addition, the randomizing of ephemeral ports can also help to probabilistically reduce the chances of duplicates from earlier connections. However, case (1), avoiding the reuse of sequence numbers within the same connection, requires an upper bound on MSL that depends upon the transfer rate, and at high enough rates, a dedicated mechanism is required."" On a gigabit link, Sequence numbers are rotated every 17 seconds. On a 100-gigabit link, this is well under a second. TCP originally was never designed for such speeds. This is dangerous, because packets from older rotation might get stuck in queue, then released by a router, get through and corrupt user data if sent to destination, or cause a TCP reset, if sent as an ack to the sender. And those old packets do have correct sequence number and correct checksum. PAWS, a system designed to prevent such issues by using timers, may fail badly and produce bad errors with 32-bit Sequence number, due to inaccurate timing issues. Data Offset: 8 bits; not including the fixed header: the first 32 or 28 bytes, for "Modern" and "Modern-no-CRC" variations, respectively. This allows far more options in to be sent in TCP.64 in fact, 1020 bytes in options, while TCPv4 provided only 40 bytes of options. Compatibility Note: 64-bit TCP breaks compatibility with existing "middleboxes". 64-bit Initial sequence number: Must be copied from the lower 32-bits to the upper 32-bits, if not specified during SYN. The actual switching to "Modern variation" happens from ACK packet (3rd handshake), assuming the "ACK" advertises such capability. 2.1. TCP Header: "Modern variation without CRC" or TCP.64-NO-CRC 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4| Data Offset | Window | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 8| |N|C|E| |A|P|R|S|F| | | |S|W|C| |C|S|S|Y|I| | | | |R|E| |K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 12| | + Sequence Number + 16| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 20| | + Acknowledgment Number + 24| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ (bytes) Same as above, but without the CRC64c checksum. This is useful for encryption protocols, such as SSL/TLS, that provide their own checksum, protection or authentication mechanisms. Initially the system switches to "Modern variation", on SYN/ACK, and with CRC64 field, but that field gets eliminated after 65535 bytes of data, which is where switching to "Modern variation without CRC" occurs. (i.e. packets that begin with relative SYN over 65535, are expected by the receiver to have no CRC64 field) In first 64 KiB, CRC is required to negotiate encryption parameters. 3. Initiating a TCP.64 Session "Modern Variation" This is the same signaling as for initiating a normal TCP connection, but the SYN, SYN/ACK, and ACK packets also carry the 64BIT_CAPABLE option. Client Server ------ ------ 64BIT_CAPABLE option -> [SYN flag] TCP SYN phase <- 64BIT_CAPABLE option [ACK flag] TCP ACK phase 64-bit TCP -> [SYN flag] TCP SYN64 phase <- 64-bit TCP [SYN+ACK flags] TCP SYN-ACK64 phase Design Note: 3-way handshake just became 4-way handshake. Why? In order to transfer extra options. Original TCP protocol allows only 40 bytes of options to be sent, which is nowhere near enough for future requirements, and session initialization. TCP.64 allows to send 1020 bytes of options, over 25x times more. 4. TCP.64 "Modern Variation" establishment options TCP "Modern Variation" Option: +---------+---------+---------+---------+ | Kind=31 | Length=7|Variation|Upper ISN| | | | | 32-bits | +---------+---------+---------+---------+ 1 1 1 4 Kind: 31 (To be determined by IANA) Length: 3 = For flags (such as "Checksum ignored") 7 = For setting variation + upper 32-bits of initial sequence number. Variations or codes: 0 = Checksum ignored (useful for ILNP); Takes effect from 1st packet. (initial SYN) 1 = Modern Variation (TCP.64-bit capable) 2 = Modern Variation, without CRC field; (useful for encryption) Takes effect after 64 KiB of data. Those options must be sent during the initial phase. When setting "Modern variation", a node must also set the upper 32-bits of initial sequence number. 4.1. TCP.64 Session option: Modern Maximum Segment Size Kind: 2 Length: 6 bytes MSS: 32-bit. +---------+---------+---------+ | Kind=2 |Length=6 | MSS | +---------+---------+---------+ 1 1 4 Modern Maximum Segment Size Option Data: 32 bits If this option is present, then it communicates the maximum receive segment size at the TCP which sends this segment. This field must only be sent in the initial connection request (i.e., in segments in phase). If this option is not used, any segment size is allowed. This option, if present, overrides the classic 16-bit MSS. This feature exists to support "Jumbograms". (packets > 64KiB) 4.2. TCP.64 Session option: Modern window scaling Kind: 3 Length: 3 bytes shift.cnt: valid value up to 38 +---------+---------+---------+ | Kind=3 |Length=3 |shift.cnt| +---------+---------+---------+ 1 1 1 This option is an offer, not a promise; both sides must send Modern Window Scale options in their SYN64 segments to enable window scaling in either direction. If window scaling is enabled, then the TCP that sent this option will right-shift its true receive-window values by 'shift.cnt' bits for transmission in SEG.WND. The value 'shift.cnt' may be zero (offering to scale, while applying a scale factor of 1 to the receive window). This option may be sent in an initial segment (i.e., a segment with the SYN bit on and the ACK bit off). It may also be sent in a segment, but only if a Window Scale op- tion was received in the initial segment. A Window Scale option in a segment without a SYN bit should be ignored. The Window field in a SYN64 (i.e., a or ) segment itself is never scaled. Receiving this option during phase by both sides overrides the classic "Window Scale Option" set during normal phase, if any. Actually moving to "Modern variation" by itself invalidates this legacy option. * All windows are treated as 64-bit quantities for storage in the connection control block and for local calculations. This includes the send-window (SND.WND) and the receive- window (RCV.WND) values, as well as the congestion window. * Valid value up to 38. Anything higher gets this option ignored. Design note: "Classic variation" TCP with it's 30-bit window (16-bit window+ 14-bit window scaling) will work till about 100 Gbit links. At 100 GBit links + 100 ms latency, you will have 10 GBit of unacknowledged data, mid-air. Over 100 gigs TCP performance will start to degrade. But today (2015) already 1 TBit experimental transmitters exist, so planning for the future, we need *much* larger window or scaling. This is why 38-bit Window Scaling was invented for TCP.64-bit, along with 24-bit Window. 4.3. TCP.64 Session option: Checksum ignored This option lets the receiver to ignore TCP-supplied checksum. Affects both classic 16-bit checksum as well as CRC. Affects TCP session from the initial SYN packet. This allows for Identity-Locator Network Protocol (ILNP), as defined in RFC 6740-48, to function with TCP/IP networks. In this case ILNP or other protocol should provide their own checksums or error correction for both TCP and IP headers and data. When used together with TCP.64-bit "modern variation", the system always resets to "64-bit-NO-CRC" from the first packet. (i.e. SYN64 phase) Acknowledgements: Influenced by the hard work of Robert Ullmann "TP/IX: The Next Internet" [RFC-1475] and: "Identifier-Locator Network Protocol (ILNP)", [RFC-6740] written by "Randall J. Atkinson" and "SN Bhatti". and: [RFC-2675]; "IPv6 Jumbograms" by David A. Borman, Stephen E. Deering, and Robert M. Hinden and partially derived from: [RFC-1323] and [RFC-7323]; "TCP Extensions for High Performance" by "D. Borman", "B. Braden", "V. Jacobson", "R. Scheffenegger, Ed." And big thanks to DARPA for the original specification of Transmission Control Protocol, as defined in [RFC-793] ! One of the greatest inventions of the Internet ! Authors' Contacts Alexey Eromenko Israel Skype: Fenix_NBK_ EMail: al4321@gmail.com Facebook: https://www.facebook.com/technologov INTERNET-DRAFT Alexey expiration date: 2016-09-02