Network Working Group Y. Nishida Internet-Draft WIDE Project Intended status: Standards Track April 15, 2011 Expires: October 17, 2011 Rescue Retransmission for SACK-based Loss Recovery Algorithm draft-nishida-tcpm-rescue-retransmission-00 Abstract This memo describes an issue in the recovery algorithm in RFC3517 and proposes a simple modification to avoid unnecessary timeouts for performance improvement. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 17, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Nishida Expires October 17, 2011 [Page 1] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 3. Problem Description . . . . . . . . . . . . . . . . . . . . . 5 4. Possible Scenario . . . . . . . . . . . . . . . . . . . . . . 6 5. Proposed Fix . . . . . . . . . . . . . . . . . . . . . . . . . 8 6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 9 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 10.2. Informative References . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 14 Nishida Expires October 17, 2011 [Page 2] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 1. Introduction RFC3517 [RFC3517] defines conservative loss recovery algorithm based on the use of the selective acknowledgment (SACK) TCP option [RFC2018]. It is designed to follows the guidelines set in RFC2581 [RFC2581] in order to be used safely in TCP implementations. However, in some situations, the loss recovery algorithm in RFC3517 fails to retransmit segments even though there are available pipe size for the connection. This failure of the retransmission can causes unnecessary timeouts which can lead performance degradation. This document describes the issue and propose a simple modification to solve this problem. The proposed solution allows SACK-based TCP to attain the same performance as NewReno [RFC3782]. Nishida Expires October 17, 2011 [Page 3] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Nishida Expires October 17, 2011 [Page 4] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 3. Problem Description In RFC3517, when a sender receives the duplicate ACK corresponding to DupThresh ACKs, it enters loss recovery phase. In the loss recovery phase, whenever sender receives ACK segments, it re-calculate the size of pipes by calling Update() and SetPipe(). and determines which segments should be sent by calling NextSeg(). However, there are some situations where NextSeg() returns no segment although the size of pipes is not zero. This behavior results from the following logic in the NextSeg(). When NextSeg() tries to find segments to be retransmitted, it uses the IsLost() that returns segments which are most likely lost. In order to increase the accuracy, IsLost() determines that the packet with 'SeqNum' is lost when DupThresh discontiguous SACKed sequences have arrived above 'SeqNum' or (DupThresh * SMSS) bytes with sequence numbers greater than 'SeqNum' have been SACKed. If IsLost() returns no packet, NextSeg() uses new segments for the next transmission. In this logic, a problem can arise when a sender does not have new segments to be sent. In this case, if IsLost() returns no packet, NextSeg() cannot find a packet for the next transmission and packet transmissions will be delayed until one of the following events happens. o ACKs have arrived and IsLost() finds new lost segments o Application feeds data to TCP o Retransmission timer expires However, in some situations, such as where window size is small, the number of arrived ACKs might not be enough to identify lost segments. In addition, applications might feed data intermittently or might not have no more data to feed. In this case, TCP will need timer expiration to retransmit segments even though there are enough pipe size to send a packet. Nishida Expires October 17, 2011 [Page 5] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 4. Possible Scenario This section describe a possible scenario where the issue described in the document happens. The following is a virtual tcpdump log. 1 10:41:00.000001 A > B: . 1000:2000(1000) ack 1 win 32768 2 10:41:00.001001 A > B: . 2000:3000(1000) ack 1 win 32768 3 10:41:00.002001 A > B: . 3000:4000(1000) ack 1 win 32768 4 10:41:00.003001 A > B: . 4000:5000(1000) ack 1 win 32768 5 10:41:00.004001 A > B: . 5000:6000(1000) ack 1 win 32768 6 10:41:00.010001 B > A: . ack 1000 win 16384 < sack {2000:3000} > 7 10:41:00.011001 B > A: . ack 1000 win 16384 < sack {2000:4000} > 8 10:41:00.012001 B > A: . ack 1000 win 16384 < sack {2000:5000} > 9 10:41:00.015001 A > B: . 1000:2000(1000) ack 1 win 32768 10 10:41:00.018001 B > A: . ack 5000 win 16384 In this example, A sends data segments to B. At the beginning of the log, the cwnd of A is 5 SMSS (SMSS=1000 octets), hence A sends 5 segments to B (line 1-5). Here, if the segment sent in line 1 (segment 1000:2000) and line 5 (segment 5000:6000) are lost, B sends 3 duplicated ACKs for the lost segment (line 6-8) to ask retransmission for the segment 1000:2000. At line 8, A receives DupThresh ACKs and retransmits the lost segment (at line 9). At this time, A enters loss recovery phase and set pipe size to 2.5 SMSS. At line 10, A receives the ACK triggered by the arrival of the segment 1000:2000. Upon the reception of the ACK at line 10, A performs the following steps to determine if there are segments can be sent. 1. Update the pipe size by calling update() and SetPipe(). Since HighACK = 5000, HighData is 6000 and IsLost(5000) returns false, the value of pipe is set to 1000. 2. Because cwnd - pipe >= 1 SMSS, it decides to send one or more segments. 3. Call NextSeg() to determine what segments to be sent. Now, if A has no unsent data, only available packet can be sent is segment 5000:6000. NextSeg() checks if this segment can be sent by applying the following logics, however none of them can be applied. 1. rule (1) cannot be applied to this segment. Because (1.b) and (1.c) return false, Nishida Expires October 17, 2011 [Page 6] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 2. rule (2) cannot be applied since there is no available unsent data. 3. rule (3) cannot be applied to this segment. Because (1.b) returns false. Hence NextSeg() returns no segment in this case, which means TCP has no segment to be sent until timeout happens. In case where there are multiple packet loss in a window and TCP has no data to send at the moment, it will be possible that TCP falls into this situation. Nishida Expires October 17, 2011 [Page 7] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 5. Proposed Fix To solve the problem mentioned above, we propose to introduce one variable: RescueRxt for TCP sender and add the following logic as the fourth rule. (4) If the conditions for rules (1), (2) and (3) fail, but there exists unSACKed data, one segment of up to SMSS octets MAY be returned if RescueRxt is not set. The returned segment MUST include the highest unSACKed sequence number. When a segment is returned by this rule, RescueRxt MUST be set to the highest octets of the segment. Also, HighRxt MUST NOT be updated. In addition to this rule, TCP sender MUST reset RescueRxt when it receives cumulative ACK for a sequence number greater than RescueRxt. Nishida Expires October 17, 2011 [Page 8] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 6. Discussion The simple approach to address this issue is to send unSACKed data when the conditions for rules (1), (2) and (3) failed as long as there is available pipe size. A similar approach is also proposed in [I-D.scheffenegger-tcpm-sack-loss-recovery]. However, this approach can cause lots of unnecessary retransmissions where segments are reordered but not lost. The proposed fix in the document allows TCP to retransmit one segment per RTT where all available data TCP has is unSACKed and not sure if it is lost. Since the objective of this algorithm is to avoid retransmission timeout and maintain ack clocking, but not to utilize unused pipe, sending one segment per RTT is enough for this purpose. By sending this one packet, the sender TCP will have a good chance to receive additional ACKs from the receiver, which can trigger another retransmissions in the next RTT. The variable RescueRxt ensures that the retransmission by this algorithm happens only once in a RTT. This logic can drastically suppress amount of unnecessary retransmissions in case of reordering. Nishida Expires October 17, 2011 [Page 9] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 7. Acknowledgements The authors gratefully acknowledge Richard Scheffenegger who originally identified the issue described in the document and gave insightful comments. The authors also would like to appreciate Mark Allman and Ethan Blanton for their careful reviewing on the initial idea of the logic and their valuable feedbacks. Nishida Expires October 17, 2011 [Page 10] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 8. Security Considerations This document only propose simple modification in RFC3782. There are no known additional security concerns for this algorithm. Nishida Expires October 17, 2011 [Page 11] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 9. IANA Considerations This document does not create any new registries or modify the rules for any existing registries managed by IANA. Nishida Expires October 17, 2011 [Page 12] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 10. References 10.1. Normative References [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", RFC 2581, April 1999. [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A Conservative Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP", RFC 3517, April 2003. [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno Modification to TCP's Fast Recovery Algorithm", RFC 3782, April 2004. 10.2. Informative References [I-D.scheffenegger-tcpm-sack-loss-recovery] Scheffenegger, R., "Improving SACK-based loss recovery for TCP", draft-scheffenegger-tcpm-sack-loss-recovery-00 (work in progress), November 2010. Nishida Expires October 17, 2011 [Page 13] Internet-Draft Rescue Retransmission for SACK Recovery April 2011 Author's Address Yoshifumi Nishida WIDE Project Endo 5322 Fujisawa, Kanagawa 252-8520 Japan Email: nishida@wide.ad.jp Nishida Expires October 17, 2011 [Page 14]