TCP Maintenance and Minor R. Scheffenegger, Ed. Extensions (tcpm) NetApp, Inc. Internet-Draft M. Kuehlewind Intended status: Experimental University of Stuttgart Expires: September 15, 2011 March 14, 2011 Additional negotiation in the TCP Timestamp Option field during the TCP handshake draft-scheffenegger-tcpm-timestamp-negotiation-01 Abstract RFC 1323 defines the TSecr field of a SYN packet to be not valid and thus this field will always be zero. This documents specifies the use of this field to signal and negotiate additional information about the content of the TSopt field as well as the behavior of the receiver. If the receiver understands this extension, it will use the TSecr field of the SYN/ACK to reply. Otherwise the receiver will ignore the TSecr field and set a timestamp in the TSecr field as specified in RFC 1323. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 15, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 1] Internet-Draft Timestamp Negotiation March 2011 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Capability Flags . . . . . . . . . . . . . . . . . . . . . 5 4.2. Implicit extended negotiation . . . . . . . . . . . . . . 7 5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9.1. Normative References . . . . . . . . . . . . . . . . . . . 9 9.2. Informative References . . . . . . . . . . . . . . . . . . 9 Appendix A. Optional Capability Flags . . . . . . . . . . . . . . 10 A.1. Range Negotiation . . . . . . . . . . . . . . . . . . . . 11 Appendix B. Revision history . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 2] Internet-Draft Timestamp Negotiation March 2011 1. Introduction The TCP Timestamps Option (TSopt) provides timestamp echoing for Round-trip Time (RTT) measurments. TSopt is widely deployed and activated by default in many systems. RFC 1323 [RFC1323] specifies TSopt the following way: Kind: 8 Length: 10 bytes +-------+-------+---------------------+---------------------+ |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| +-------+-------+---------------------+---------------------+ 1 1 4 4 RFC1323 TSopt "The Timestamps option carries two four-byte timestamp fields. The Timestamp Value field (TSval) contains the current value of the timestamp clock of the TCP sending the option. The Timestamp Echo Reply field (TSecr) is only valid if the ACK bit is set in the TCP header; if it is valid, it echos a timestamp value that was sent by the remote TCP in the TSval field of a Timestamps option. When TSecr is not valid, its value must be zero. The TSecr value will generally be from the most recent Timestamp option that was received; however, there are exceptions that are explained below. A TCP may send the Timestamps option (TSopt) in an initial SYN segment (i.e., segment containing a SYN bit and no ACK bit), and may send a TSopt in other segments only if it received a TSopt in the initial SYN segment for the connection." The comparison of the timestamp in the TSecr field to the current time gives an estimation of the RTT. RFC 1323 [RFC1323] specifies various cases when more than one timestamp is available to echo. The proposed solution might not always be the best choice, e.g. when the TCP Selective Acknowledgment Option (SACK) is used. Moreover, more and more use cases arise where one-way delay (OWD) measurements are needed. These mechanism misuse usually the TSopt to estimated the variation in OWD. To enable such mechanisms the TSecr field in the TCP SYN packet could be used for additional negotiation. Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 3] Internet-Draft Timestamp Negotiation March 2011 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 4] Internet-Draft Timestamp Negotiation March 2011 2. Overview Enhancements in the area of TCP congestion control can use the measurement of the one-way delay variance as one input. However, without explicit knowledge of the partner's timestamp clock, arriving at a good estimate requires multiple segent exchanges over a few round trip times. Nevertheless, any such calculation has to make assumptions about the network state at the time of the measurement. In order to assist such algorithms, explicit knowledge at a early phase of the session can be negotiated. In addition, by using synergistic signalling between Timestamps and other options such as selective acknowledgment, enhancments in loss recovery are possible by removing any retransmission ambiguity . However, currently receivers are required to only reflect the timestamp of the last segment that was received in order. Therefore, a backwards compatible way of changing this behavior is required. Furthermore, as the importance of the timestamp option increases by using it in more aspects of a TCP senders algorithm, so increases the importance of maintaining the integrity of the reflected timestamps, while allowing the receiver to make use of a senders timestamp. As an optional extension, a timestamp clock rate range negotiation is also introduced. However, this is only included as example of further possible enhancements. 3. Definitions The reader is expected to be familiar with the definitions given in [RFC1323]. 4. Signaling During the initial TCP three-way handshake, timestamp options are negotiated using the TSecr field. A compliant TCP receiver will XOR the flags with the received TSval, when responding with the SYN+ACK. Timestamp Options MAY only be present when the SYN bit is set. 4.1. Capability Flags In order to signal the supported capabilities, the TSecr is overloaded with the following flags and fields during the three-way handshake. If optional capabilities such as tcp clock range are presented, minimal state will be required in the host to decode the returned Flags xor'ed with the TSval. Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 5] Internet-Draft Timestamp Negotiation March 2011 Kind: 8 Length: 10 bytes +-------+-------+---------------------+---------------------+ |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| +-------+-------+---------------------+---------------------+ 1 1 4 | 4 | / | .-----------------------------------' | / \ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R|R| | |S| | | |X|E|E| MASK | RES |G| EXP16 | FRAC16 | |O|S|S| | |N| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ timestamp option flags EXO - Extended Options Indicated that the sender supports extended timestamp options as defined by this document, and MUST be set ("1") by compliant implementations. RES - Reserved Reserved for future use. MUST not be set ("0"). If a timestamp option is received with this bit set, the receiver MUST ignore the extended options field and react as if the Flags were not set (compatibility mode). MASK - Mask Timestamps If the timestamp is used for congestion control purposes, an incentive exists for malicious receivers to reflect tampered timestamps. A sender MAY choose to protect timestamps from such modifications by including a fingerprint (secure hash of some kind) in some of the least significant bits. However, doing so would prevent a receiver from using the timestamp for other purposes. The MASK field indicates how many least significant bits should be excluded by the receiver, when processing a timestamp for timing purposes. Note that this does not impact the reflected timestamp in any way - TSecr will always be equal to an appropriate TSval. Another use case would be when the sender does not support a timestamp clock which can guarantee unique timestamps for retransmitted segments. For unambigously identifying regular from retransmitted segments, the timestamp must be unique for otherwise identical segments. Reserving the least significant bits for this purpose allows senders with slow Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 6] Internet-Draft Timestamp Negotiation March 2011 running timestamp clocks to make use of this feature. Note that the use of this option as implications in the protection against wraped sequence numbers (PAWS - [RFC1323]), as the more bits are set aside for tamper prevention, the faster the timestamp number space cycles itself. SGN - binary16 Sign This is the sign bit of the IEEE 754-2008 binary16 floating point representation of the timestamp clock. Timestamp clocks MUST be positive, thus this bit MUST be zero. EXP16 - binary16 Exponent The exponent component of a binary16 floating point number indicating the timestamp clock. The exponent bias is 28, which is not identical to the binary16 definition in IEEE 754-2008. Subnormal numbers (lower precision), where the exponent is set to zero, extend the lowest possible value representation to 2^-38 (or 7.276 ps) at reduced precision. Infinity and NaN (all exponent bits set) are not suppored, an exponent value of 31 is to be treated as normal exponent. This allows timestamp clock rates of up to 15.999 sec. FRAC16 - binary16 Fraction The fraction component of a binary16 floating point number indicating the timestamp clock. The clock rate is measured in seconds between ticks. The range with the highest resolution, excluding subnormal numbers, covers clock ranges between 7.45 ns and 15.99 sec. It is expected, that timestamp clock rates in excess of 0.1 ms are implemented by inserting the timestamp "late" before transmitting a segment. Example for an extended timestamp option, to indicate that the senders timestamp clock (tcp clock) is running with 1 ms per tick: SYN, TSopt=, TSecr=EXO|MASK|EXP16=18|FRAC16=0x018 The clock rate calculates as 2^(18-28)*1.0000011b, thus indicates an actual clock rate of 999.45 us 4.2. Implicit extended negotiation If both timestamp extended options and selective acknowledgement options ([RFC2018]) are negotiated, both hosts MUST mirror the timestamp option immediately after receiving it. Note that this is in conflict with [RFC1323], where only the timestamp of the last segment received in sequence is mirrored. As SACK allows discrimination of reordered or lost segments, the reflected timestamps are not required to convey the most conservative Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 7] Internet-Draft Timestamp Negotiation March 2011 information. If SACK indicates lost or reordered packets at the receiver, the sender MUST take appropriate action such as ignoring the received timestamps for calculating the round trip time, or assuming a delayed packet (with appropriate handling). The exact implications are beyond the scope of this note. This allows the synergistic use of the timestamp option with the SACK option to improve loss recovery, round trip time and one way delay variance measurements even during loss or reordering episodes. This is enabled by removing any retransmission ambiguity using unique timestamps for every retransmission, while simultaneously the SACK option indicates the ordering of received segments even in the presence of ACK loss or reordering. 5. Discussion One-way delay (variation) based congestion controls would benefit from knowing the clock resolution on both sides. RTT variance during loss episodes is not deeply researched. Current heuristics (RFC1122, RFC1323, Karn's algorithm, RFC2988) explicitly exclude (and prevent) the use of RTT samples when loss occurs. However, solving the retransmission ambiguity problem - and the related reliable ACK delivery problem - may allow the refinement of these algorithms further, as well as enabling new research to distinguish between corruption loss (without RTT / one-way delay impact) and congestion loss (with RTT / one-way delay impact). Research into this field appears to be a rather neglected, especially when it comes to large scale, public internet investigations. Due to the very nature of this, passive investigations without signals contained within the headers are only of limited use in empirical research. Retransmission ambiguity detection during loss recovery would allow an additional level of loss recovery control without reverting to timer-based methods. As with the deployment of SACK, separating "what" to send from "when" to send it could be driven one step further. In particular, less conservative loss recovery schemes which do not trade principles of packet conservation against timeliness, require a reliable way of prompt and best possible feedback from the receiver about any delivered segment and their ordering. SACK alone goes quite a long way, but using Timestamp information in addition could remove any ambiguity. However, the current specs in RFC1323 make that use impossible, thus a modified signaling (receiver behavior) is a necessity. Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 8] Internet-Draft Timestamp Negotiation March 2011 6. Acknowledgements The authors would like to thank Dragana Damjanovic for some initial thoughts around Timestamps and their extended potential use. 7. IANA Considerations This memo includes no request to IANA. 8. Security Considerations The algorithm presented in this paper shares security considerations with [RFC1323]. Some implementations address the vulerabilities of [RFC1323], by dedicating a few low-order bits of the timestamp fields for use with a (secure) hash, that protects against malicious tweaking of TSecr values. A Flag-field has been provided to transparently notify the receiver about that use of low-order bits, so that they can be excluded in one-way delay calculations. 9. References 9.1. Normative References [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [Chirp] Kuehlewind, M. and B. Briscoe, "Chirping for Congestion Control - Implementation Feasibility", Nov 2010, . [I-D.ietf-tcpm-tcp-security] Gont, F., "Security Assessment of the Transmission Control Protocol (TCP)", draft-ietf-tcpm-tcp-security-02 (work in progress), January 2011. Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 9] Internet-Draft Timestamp Negotiation March 2011 Appendix A. Optional Capability Flags Certain hosts may want to negotiate a certain optimal Timestamp Clock Rate for various purposes. For example, the balance between PAWS ([RFC1323]) and the timestamp clock resolution should be more towards one or the other. Also, if certain algorithms want to have identical timestamp clock rates both at the sender and receiver, negotiating the clock rate would be preferrable. However, without a full three way handshake, full negotiation of the timestamp clock rate is not possible. For this purpose, the following extension of this proposal is suggested. Kind: 8 Length: 10 bytes +-------+-------+---------------------+---------------------+ |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| +-------+-------+---------------------+---------------------+ 1 1 4 | 4 | / | .-----------------------------------' | / \ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|R|R| | | | | | |X|E|N| MASK | EXP12lo | FRAC12lo | EXP12hi | FRAC12hi | |O|S|G| | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ timestamp option flags The following additonal fields are defined: RNG - Range negotiation Indicated that the sender is capable of adjusting the timestamp clock rate within the bounds of the two 12 bit fields (see Appendix A.1). This flag is only valid in the initial SYN segment, and invalid when the ACK bit is set. EXP12lo and EXP12hi - binary12 Exponent The exponent component of a truncated, 12 bit floating point number indicating the possible timestamp clock ranges. The exponent bias is also 28, and no special numbers (infinity, NaN) Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 10] Internet-Draft Timestamp Negotiation March 2011 are allowed. The exponent value 31 is treated like any other exponent value. Only the host initiating a TCP session MAY offer a timestamp clock range, while the receiver SHOULD select a timestamp clock within these bounds. If the receiver can not adjust it's timestamp clock to match the range, it MAY use a timestamp clock rate outside these bounds. If the receiver indicated a timestamp clock rate within the indicated bounds, the sender MUST set it's timestamp clock rate to the negotiated rate. If the receiver uses a timestamp clock rate outside the indicated bounds, the sender MUST set the local timestamp clock rate to the value indicated at the closer bound. FRAC12lo and FRAC12hi - binary12 Fraction The fraction component of a 12 bit floating point number. Subnormal numbers are allowed (Exponent value 0). This allows a range between 7.45 ns and 15.99 s with full resolution (lower bound is 0.06 ns using subnormal values). A.1. Range Negotiation The following sequence would negotiate the timestamp clock rate for both sender and receiver, where both finally know the clock rate of the respective partner. SYN, TSopt=, TSecr=EXO|RNG|MASK|12bit-lo=1ms|12bit-hi=100ms SYN,ACK, TSopt=, TSecr=^EXO|MASK|16bit=10ms In this example, both hosts would run their respective timestamp clocks with a resolution of 10 ms. SYN, TSopt=, TSecr=EXO|RNG|MASK|12bit-lo=1ms|12bit-hi=100ms SYN,ACK, TSopt=, TSecr=^EXO|MASK|16bit=1000ms In this example, the sender would run the timestamp clocks with a resolution of 100 ms (closer to the receivers clock rate of 1 sec), while the receiver will have a timestamp clock rate running at 1 sec. SYN, TSopt=, TSecr=EXO|RNG|MASK|12bit-lo=1ms|12bit-hi=100ms SYN,ACK, TSopt=, TSecr=^EXO|MASK|16bit=100us In this example, the sender would run the timestamp clocks with a resolution of 10 ms (closer to the receivers clock rate of 0.1 ms), while the receiver will have a timestamp clock rate running at 0.1ms. Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 11] Internet-Draft Timestamp Negotiation March 2011 Appendix B. Revision history 00 ... initial draft, early submission to meet deadline 01 ... refined draft, focusing only on those options that have an immediate use case. Also excluding flags that can be substituted by other means (MIR - synergistic with SACK options only, RNG moved to appendix, BIA removed while the exponent bias is at a fixed value. Also extended other paragraphs. Authors' Addresses Richard Scheffenegger (editor) NetApp, Inc. Am Euro Platz 2 Vienna, 1120 Austria Phone: +43 1 3676811 3146 Email: rs@netapp.com Mirja Kuehlewind University of Stuttgart Pfaffenwaldring 47 Stuttgart 70569 Germany Email: mirja.kuehlewind@ikr.uni-stuttgart.de Scheffenegger & Kuehlewind Expires September 15, 2011 [Page 12]