Network Working Group Internet draft Mitchell Erblich August 2006 Category: Experimental Alteration of Karn's Algorithm for High Bandwidth / Delay Environments Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2006). All Rights Reserved. ABSTRACT Karn's algorithm specifies acknowledgements that are the result of segment retransmits should be ignored, not timed, and should not contribute to the smoothed round-trip-time (SRTT) because they are considered "ambiguous". It is also stated in Karn's paper that "If an acknowledgement arrives after the RTO has expired, it is highly likely to come very shortly after wards." In time, we have added the "fast retransmit" functionality, so we are not solely dependent on RTO for retransmits. Common sense dictates that if we receive an acknowledgement "very shortly after wards", that those acknowledgments should not be considered "ambiguous". These non-ambiguous acknowledgments should be added to the SRTT and trigger us to return to our previous non-congestion behavior. Table of Contents 1. Motivation ................ 2 2. Introduction ................ 2 3. Implementation .............. 3 4. Conclusion ................ 4 5. References ................ 4 6. Security Considerations ..... 4 7. Author's Address 5 1. Motivation An ISP measured that at certain times of day, that the amount of transmitted data through a number of TCP connections far exceeded the amount of data thought to be generated by a specified set of applications. It was theorized that either a large number of segments were being dropped and/or that some segments had a large RTT and were being retransmitted needlessly. A major unseen item was a resulting drop in useable throughput of a TCP flow when congestion was not present. This is not believed to have been an issue possibly due to application limited TCP flows Out-of-order (ooo) segments are more likely to appear in high bandwidth / delay environments. It is these environments that can consume a receiver's buffer, such that the receiver can reneg and discard data that has been selectively acknowledged (SACK). However, we are more concerned about ooo segments arriving arriving at the receiver that latter results in fast-retransmits. These fast-retransmits forces us into congestion avoidance (CA), and it is the recovery time resulting and lost bandwidth that we are addressing here. In addition, [RFC3522] requires the use of TCP timestamps. This document attempts to justify that the same results can occur when TCP timestamps are not used in high-bandwidth / delay environments and quickly recovers from the false CA. 2. Introduction [KARN] specifies that segment retransmission should not be timed because the ack can not be determined to be resulting from the original or the later retransmitted segment. In Karn's original paper ONLY course grained RTO timeouts were triggers for retransmits. Thus the paper concentrated on the determination of a proper SRTT, given a set of segment RTTs. [RFC2525] specifies that "when the initial RTO < RTT, it can take a long time for the TCP to correct the problem by adapting the RTT estimate, because of the use of the Karn's algorithm". This is from section 2.7. [PAX97] introduces the concept that a large number of environments can or do re-order more than a minimal number of segments. Spurious retransmissions are the result of segment retransmissions, that are later determined to be unnecessary. These unnecessary added segment transmissions / retransmissions consume link bandwidth and decrease the actual application throughput. RFC3522 introduces multiple events that can lead to false congestion avoidance and a detection algorithm. This RFC requires the use of the segment timestamp. This document attempts to take extra steps to detect false congestion without the use of the Timestamp option and suggest parameters that could be used to restore the pre-congestion bandwidth throughput as-soon-as-possible without creating a local congestion event. 2.1 Timestamp Issues * Not all implementations enable the Timestamps option. * A receiver may forge a echoed Timestamp * The granularity for the timestamp clock for a high bandwidth link to a low delay receiver may be too fine grained, than to a high delay receiver over the same link. Only the third item is reviewed in this document. To attempt to resolve the granularity issue, we attempt to adjust the timestamp granularity based on the number of inflight segments on a per connection basis. 3. Implementation A R&D environment with interfaces up to 10Gb Ethernet was highly instrumented for a number of days. The number of inflight segments per major aggregated flow periodiclly exceeded 100k. Because of the number of possible inflight segments, it was deemed that Selective-ACKs would only complicate the implementation. We later re-enabled Timestamp option and SACK support for validation of our results. We added and used these parameters: number of inflight segments, the average size of each inflight segment, the approximate number of inflight acks, the number of un-ambiguous acks, the number of ambiguous acks within each RTT interval, a set of pre-congestion metrics, the number of current duplicate acks, etc. We used these values to identify a point that the ACK was highly likely for the original segment. Otherwise we consider this ACK without timestamp support, a ambiguous ACK. If we identified that the ack was for the original segment then the goal of this project was to attempt to implement a fast recovery scheme to pre-congestion status because of this false-congestion event. Adjusting the number of duplicate acks before a fast-retransmit was not in the scope of this project. This implementation resulted in the equivalent of a non-slow-start restart, but while we are in congestion avoidance. The specifics of the TCP modifications and the testing environment was deemed Intellectual Property (IP) by the legal staff at the client's site. Thus, those specifics have been removed from this document. 4. Conclusion The ability to identify whether a ACK is for a original or re-transmitted segment should be common sense without SACK or timestamp option if the number of in-flight segments was large enough and the ACK came shortly after the fast retransmit. However, given a large enough in-flight ACK reduction, and decreasing the number of in-flight ACKs, an implementation needs to support a segment burst methodology and the ability to determine that their the ACK is still ambiguous. It is highly likely that if the implementation uses an aggressive method, that some ACKs really are ambiguous (retransmit ACK) but are treated as non-ambiguous. Significant bandwidth recovery up to 50% can result depending on the now non-ambiguous ACKs. The amount of recovery is based partially on the amount of aggressiveness of the segment burst method used. It is also based on the quantity of ooo segments and the amount of drift of those segments. Some experimental TCP RFCs have suggested methods to decrease the likeness to generate localized congestion when restoring or generating a number of in-flight segments. 5. References KARN, P. [Aug 1987] Improving Round-Trip Times Estimates in Reliable Transport Protocols, Proceedings of the ACM SIGCOMM '87. [RFC2525] Paxson, Allman, etc. "Known TCP Implementation Problems", RFC 2525, March 1999. [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm for TCP", RFC 3522, April 2003. 6. Security Considerations This memo does not create any new security issues for the TCP protocol. 7. Author's Address Mitchell Erblich erblichs@earthlink.net Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.