Network Working Group
Internet draft					Mitchell Erblich									August 2006
Category: Experimental

  		
   	Alteration of Karn's Algorithm for 
	High Bandwidth / Delay Environments
	<draft-erblich-tcp-no-karn-alg-00.txt>


   Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   Copyright Notice

        Copyright (C) The Internet Society (2006). All Rights Reserved.


	ABSTRACT	

	Karn's algorithm specifies acknowledgements that are the
	result of segment retransmits should be ignored, not timed,
	and should not contribute to the smoothed round-trip-time
	(SRTT) because they are considered "ambiguous".  It is also
	stated in Karn's paper that "If an acknowledgement arrives
	after the RTO has expired, it is highly likely to come very
	shortly after wards." In time, we have added the "fast retransmit"
	functionality, so we are not solely dependent on RTO for
	retransmits.  Common sense dictates that if we receive an
	acknowledgement "very shortly after wards", that those
	acknowledgments should not be considered "ambiguous". These
	non-ambiguous acknowledgments should be added to the SRTT and
	trigger us to return to our previous non-congestion behavior.

	Table of Contents

	1. Motivation	................		2
	2. Introduction ................		2
	3. Implementation ..............		3
	4. Conclusion	................		4
	5. References	................		4
	6. Security Considerations .....		4
	7. Author's Address				5


	1. Motivation

	An ISP measured that at certain times of day, that the amount of
	transmitted data through a number of TCP connections far exceeded
	the amount of data thought to be generated by a specified set of
	applications. It was theorized that either a large number of
	segments were being dropped and/or that some segments had a large
	RTT and were being retransmitted needlessly.

	A major unseen item was a resulting drop in useable throughput
	of a TCP flow when congestion was not present. This is not believed
	to have been an issue possibly due to application limited TCP flows
	Out-of-order (ooo) segments are more likely to appear in high
	bandwidth / delay environments. It is these environments that can
	consume a receiver's buffer, such that the receiver can reneg and
	discard data that has been selectively acknowledged (SACK).

	However, we are more concerned about ooo segments arriving arriving
	at the receiver that latter results in fast-retransmits. These
	fast-retransmits forces us into congestion avoidance (CA), and
	it is the recovery time resulting and lost bandwidth that we
	are addressing here.

	In addition, [RFC3522] requires the use of TCP timestamps. This
	document attempts to justify that the same results can occur
	when TCP timestamps are not used in high-bandwidth / delay
	environments and quickly recovers from the false CA.


	2. Introduction

	[KARN] specifies that segment retransmission should not be
	timed because the ack can not be determined to be resulting
	from the original or the later retransmitted segment. 	
	
	In Karn's original paper ONLY course grained RTO timeouts were
	triggers for retransmits. Thus the paper concentrated on the
	determination of a proper SRTT, given a set of segment RTTs.

	[RFC2525] specifies that "when the initial RTO < RTT, it can
	take a long time for the TCP to correct the problem by
	adapting the RTT estimate, because of the use of the Karn's
	algorithm". This is from section 2.7.

	[PAX97] introduces the concept that a large number of
	environments can or do re-order more than a minimal number
	of segments.

	Spurious retransmissions are the result of segment
	retransmissions, that are later determined to be unnecessary. These
	unnecessary added segment transmissions / retransmissions consume
	link bandwidth and decrease the actual application throughput.
	
	RFC3522 introduces multiple events that can lead to false
	congestion avoidance and a detection algorithm. This RFC
	requires the use of the segment timestamp.

	This document attempts to take extra steps to detect false
	congestion without the use of the Timestamp option and suggest
	parameters that could be used to restore the pre-congestion
	bandwidth throughput as-soon-as-possible without creating
	a local congestion event.
	

	2.1 Timestamp Issues

	* Not all implementations enable the Timestamps option.
	* A receiver may forge a echoed Timestamp
	* The granularity for the timestamp clock for a high
	  bandwidth link to a low delay receiver may be too
	  fine grained, than to a high delay receiver over
	  the same link.

	Only the third item is reviewed in this document. To attempt
	to resolve the granularity issue, we attempt to adjust the
	timestamp granularity based on the number of inflight segments
	on a per connection basis.


	3. Implementation

	A R&D environment with interfaces up to 10Gb Ethernet was highly
	instrumented for a number of days. The number of inflight segments
	per major aggregated flow periodiclly exceeded 100k. Because of the
	number of possible inflight segments, it was deemed that
	Selective-ACKs would only complicate the implementation. We
	later re-enabled Timestamp option and SACK support for validation
	of our 	results.

	We added and used these parameters: number of inflight segments,
	the average size of each inflight segment, the approximate number
	of inflight acks, the number of un-ambiguous acks, the number of
	ambiguous acks within each RTT interval, a set of pre-congestion
	metrics, the number of current duplicate acks, etc.  We used these
	values to identify a point that the ACK was highly likely for the
	original segment. Otherwise we consider this ACK without timestamp
	support, a ambiguous ACK.

	If we identified that the ack was for the original segment then
	the goal of this project was to attempt to implement a fast
	recovery scheme to pre-congestion status because of this 
	false-congestion event. Adjusting the number of duplicate acks
	before a fast-retransmit was not in the scope of this project.
	This implementation resulted in the equivalent of a non-slow-start
	restart, but while we are in congestion avoidance.

	The specifics of the TCP modifications and the testing environment
	was deemed Intellectual Property (IP) by the legal staff at
	the client's site. Thus, those specifics have been removed from
	this document. 


	4. Conclusion

	The ability to identify whether a ACK is for a original or
	re-transmitted segment should be common sense without SACK or
	timestamp option if the number of in-flight segments was large
	enough and the ACK came shortly after the fast retransmit.

	However, given a large enough in-flight ACK reduction, and
	decreasing the number of in-flight ACKs, an implementation
	needs to support a segment burst methodology and the ability
	to determine that their the ACK is still ambiguous. It is
	highly likely that if the implementation uses an aggressive
	method, that some ACKs really are ambiguous (retransmit ACK)
	but are treated as non-ambiguous.

	Significant bandwidth recovery up to 50% can result depending
	on the now non-ambiguous ACKs. The amount of recovery is based
	partially on the amount of aggressiveness of the segment burst
	method used. It is also based on the quantity of ooo segments
	and the amount of drift of those segments.  Some experimental
	TCP RFCs have suggested methods to decrease the likeness to
	generate localized congestion when restoring or generating a
	number of in-flight segments.	


	5. References

	KARN, P. [Aug 1987] Improving Round-Trip Times Estimates in
	 Reliable Transport Protocols, Proceedings of the ACM SIGCOMM '87.        

	[RFC2525] Paxson, Allman, etc. "Known TCP Implementation Problems",		  		  RFC 2525, March 1999.	

	[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm		  		  for TCP", RFC 3522, April 2003.
	

	6. Security Considerations

	This memo does not create any new security issues for the
	TCP protocol.	

	7. Author's Address

	Mitchell Erblich	erblichs@earthlink.net


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.

Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.