Internet Draft Ian Heavens Expires December 15, 1996 Fore Systems June 1996 RSTs Considered Harmful draft-heavens-problems-rsts-02.txt Status of this Memo This memo is being distributed to members of the Internet community in order to solicit their reactions to the proposals contained in it. This document is an Internet-Draft. Internet-Drafts are working do- cuments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference ma- terial or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Sha- dow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Abstract This memo argues that the danger of segments from old TCP connections occurs for connections terminated by RST segments, as well as those terminated by exchange of FIN segments. In addition, TIME-WAIT state alone does not provide complete protection. The likelihood of data corruption is significant, in that it exceeds the probability of corruption after FIN exchange for which TIME-WAIT state was designed. Heavens [Page 1] Internet Draft RSTs Considered Harmful June 1996 Table of Contents 1. Introduction 1.1 Overview 1.2 Background 1.3 RST-Terminated Connections 2. Old Segment Acceptance from RST-Terminated Connections 2.1 RST-Terminated Connections from Established State 2.2 RST-Terminated Connections during Closedown 2.3 Proof by Demonstration 2.4 Other Hazards 2.5 Relative Probabilities 3. TIME-WAIT after RST Transmission 3.1 User Abort with TIME-WAIT 3.2 RST Loss and Data Retransmission 3.3 RST Loss and Idle Connections Appendix A: A Different Interpretation of RFC-1122 Appendix B: Relative Probabilities of Hazards Appendix C: Traffic Statistics for TCP Connections Heavens [Page 2] Internet Draft RSTs Considered Harmful June 1996 Glossary o FIN-Terminated Connection A synchronised TCP connection which terminates by the 3-way handshake, involving the exchange and reliable acknowledgement of FIN segments. o RST-Terminated Connection A synchronised TCP connection which terminates by transmission or reception of a RST. o MSL Maximum Segment Lifetime Heavens [Page 3] Internet Draft RSTs Considered Harmful June 1996 1. Introduction 1.1 Overview Chapter 1 describes mechanisms for closing TCP connections, and the significance of the TIME-WAIT state. Chapter 2 identifies a series of connection terminations involving RSTs that may lead to data corruption. Chapter 3 shows how the use of TIME-WAIT state alone can provide some protection against this and identifies scenarios where this solution is insufficient. 1.2 Background FINs, RSTs, Timers and ICMP Messages There are four mechanisms available in [RFC-793] to close a TCP con- nection: FINs, RSTs, Timeouts and ICMP messages. FINs may be used to close down a connection in an orderly fashion, guaranteeing reliable delivery of all data segments transmitted before the FIN in each direction. The requirement to reliably ack- nowledge FINs in both directions leads to a number of half-closed states: FIN-WAIT-1, FIN-WAIT-2, CLOSING, CLOSE-WAIT, LAST-ACK and TIME-WAIT. A RST closes a connection abruptly, immediately removing connection state on transmission or reception. There are no interim states; transition is to CLOSED on transmission or reception of a RST. Timeouts also close a connection abruptly; a connection that times out optionally transmits a RST, or it may assume that the peer has disappeared. Timeouts also cause an immediate transition to CLOSED. ICMP messages do not usually terminate a synchronised connection, but it is possible. In the same way as connections terminated by RST or timeout, there is an immediate state transition to CLOSED. This memo restricts its attention to connections closed by FINs and RSTs. TIME-WAIT The TIME-WAIT state has two functions in the TCP protocol. The first is asymmetric: to ensure the reliable acknowledgement of FINs transmitted in CLOSE-WAIT state and so the completion of the 3-way Heavens [Page 4] Internet Draft RSTs Considered Harmful June 1996 closing handshake. The second is symmetric: to ensure that all TCP segments, generated in either direction during the lifetime of the connection, have drained from the network before initiation of a new incarnation of the connection. The clock based ISN protects slow con- nections against this threat [RFC-793]. For fast connections, this is no longer true. In this case, TIME-WAIT prevents the acceptance of old duplicate segments by a new incarnation utilising identical port numbers. The relative threats are explained in the Appendix of [RFC- 1185], and in section 1.2 of [RFC-1323]. The problem is summarised in relation to the danger of premature termination of TIME-WAIT state by RST reception (TIME-WAIT assassination) in [RFC-1337]. No equivalent mechanism to TIME-WAIT exists for connections ter- minated by transmission of a RST segment. Although RST transmission is omitted from the TCP Connection State Diagram, the text of [RFC- 793] clearly states that where the transmission of a RST results in a state change, it is to CLOSED state. Similarly, reception of a RST causes a state change to CLOSED. 1.3 RST-Terminated Connections There are several ways in which previously synchronised connections are terminated by RST transmission. These include User Abort [RFC- 793] and reception of data after half-duplex close [RFC-1122]. How- ever, not all RSTs result in connection termination. Reception of a SYN segment addressed to a port for which there is no listening socket results in transmission of a RST. This is associated with no connection and is equivalent to an ICMP Port Unreachable. The origi- nator of the SYN changes state from SYN-SENT to CLOSED on reception of the RST, and the connection is never synchronised. Other connec- tions in non-synchronised states respond to an unacceptable ACK, security or precedence mismatch by transmitting a RST. In all these cases, no connection has been synchronised nor data sent, so that there is no danger of old data segments being accepted by subsequent incarnations of the connection. This memo distinguishes those synchronised connections which ter- minate by transmission or reception of a RST by referring to them as "RST-terminated connections". Heavens [Page 5] Internet Draft RSTs Considered Harmful June 1996 2. Old Segment Acceptance from RST-Terminated Connections Several scenarios result in the spurious acceptance of old segments from RST-terminated connections. Two types of examples are given here: connections aborted in Established state, and connections aborted during the 3-way closing handshake. 2.1 RST-Terminated Connections from Established State There are two instances of RST-terminated connections from Esta- blished state which involve the hazard of old data acceptance by a subsequent incarnation of the connection. The first is a User Abort issued in Established state; the second a half-duplex close with unread data [RFC-1122, p.88]. The sequence of events in both case is identical: a RST is sent by the socket from Established state, as a result of an abort, or a close with pending unread data. In the worst failure mode, the socket issuing the abort is acting as a data sink. In this case a window of data segments may be in tran- sit when the RST is received at the data source. Any of these seg- ments - which are not duplicates - may corrupt a subsequent incarna- tion of the connection. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- CLOSED 4. ESTABL. --> ... 5. ESTABL. <-- ... 6. ESTABL. --> ... 7. ESTABL. --> ... 8. CLOSED <-- ... Figure 1. Connection closed by User Abort This is shown in Figure 1. TCP A is the data source and TCP B is the data sink. Line 1 shows a normal data segment from TCP A. An ACK Heavens [Page 6] Internet Draft RSTs Considered Harmful June 1996 segment is transmitted by TCP B on line 2. TCP B user issues an abort, transmits a RST, and enters CLOSED state on line 3, as speci- fied in [RFC-793]. Normal data continues to be transmitted by TCP A on line 4. Line 5 shows the arrival at TCP A of the ACK generated on line 2. This may open the window and elicit further segments from TCP A on lines 6 and 7, until the arrival of the RST at TCP A on line 8. At this point TCP A enters CLOSED state, and three data segments from TCP A are in transit to TCP B. The connection is reopened by the 3-way SYN handshake. Assume that the clock based ISN chosen by TCP A for the new connection has been overrun by the sequence number consumption in the previous incarna- tion of the connection. The sequence numbers occupied by the last three segments transmitted by TCP A during the previous incarnation may overlap the window offered by TCP B in the current incarnation of the connection. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ESTABL. <-- <-- ESTABL. 3. (old segment)... --> ESTABL. 4. ESTABL. <-- <-- ESTABL. 5. ESTABL. --> --> ESTABL. 6. ... <-- ESTABL. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7a. ESTABL. --> --> ESTABL. 8a. ESTABL. <-- ... 9a. ESTABL. --> --> ESTABL. 10a ESTABL. <-- <-- ESTABL. Figure 2: Accepting One Old Segment Figure 2 shows the spurious acceptance of part of a segment from the previous incarnation of the connection. Line 1 shows a normal data segment from TCP A after the SYN handshake has been completed. Line 2 shows the ACK of this segment, and line 3 shows the arrival of an old segment from the previous connection. It falls within TCP B's Heavens [Page 7] Internet Draft RSTs Considered Harmful June 1996 current window and is queued in the TCP reassembly queue, as its sequence number exceeds the next expected sequence number. Since there is a missing segment, the next ACK in line 4 acknowledges the previous bona fide segment, and TCP A does not detect acknowledgement of unsent data. The next segment from the current connection arrives at TCP B in line 5. At this point, part or all of the old segment is delivered to the user of TCP B, depending upon the implementation of the reassembly algorithm. This behaviour is described in [RFC-1337]. TCP B transmits the acknowledgement of the two previous segments in line 6. TCP A transmits another segment on line 7a before the arrival of the acknowledgement in line 8a, and assumes that it is a partial acknowledgement of this segment. Segment transmission and ack- nowledgement continue as usual on lines 9a and 10a. Neither TCP A nor TCP B are aware of the spurious acceptance of old data by TCP B. To underscore the possibility of the erroneous acceptance of several old segments, Figure 3 shows the acceptance of two such segments. The exchange is identical to Figure 2 until 7a, when a second old segment from TCP A arrives at TCP B. Since TCP B has queued the first old segment from TCP A, it delivers the entire second old seg- ment to the user. TCP B transmits the acknowledgement on line 7b. Line 8a and subsequent lines show the arrival of the acknowledgements of spurious segments and the transmission of further segments by TCP A. The acknowledgements are accepted as valid, since TCP A has already transmitted past the sequence number acknowledged in the last ACK from TCP B. Heavens [Page 8] Internet Draft RSTs Considered Harmful June 1996 TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ESTABL. <-- <-- ESTABL. 3. (old segment)... --> ESTABL. 4. ESTABL. <-- <-- ESTABL. 5. ESTABL. --> --> ESTABL. 6. ... <-- ESTABL. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7a. (old segment)... --> ESTABL. 7b. ... <-- ESTABL. 7c. ESTABL. --> --> ESTABL. 7d. ... <-- ESTABL. 8a. ESTABL. <-- ... 9a. ESTABL. --> --> ESTABL. 9b. ESTABL. <-- ... 9c. ESTABL. <-- ... 10a ESTABL. <-- <-- ESTABL. Figure 3: Accepting Two Old Segments These examples may be generalised to illustrate the arrival and acceptance of a window of old segments at TCP B. It is also possible for old segments to persist in the case where a user abort is issued on the socket acting as a data source. This happens when the ensuing RST arrives before one or more of the data segments previously transmitted. This is shown in Figure 4. Heavens [Page 9] Internet Draft RSTs Considered Harmful June 1996 TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ESTABL. <-- <-- ESTABL. 3. ESTABL. --> ... 4. ESTABL. --> ... 5. ESTABL. --> ... (User Abort) 6. CLOSED --> 7. ... --> CLOSED 8. --> 9. --> 10. --> Figure 4. User Abort and RST Reordering The acceptance of old segments in transit on lines 8, 9 and 10 occurs in an identical fashion to the previous example, as shown in Figures 2 and 3. 2.2 RST-Terminated Connections during Closedown RST-terminated connections also occur from states other than Esta- blished, during the 3-way closing handshake. Two examples are User Abort [RFC-793] and Half Duplex Close [RFC-1122]. User Abort during Closedown A user abort issued in FIN-WAIT-1, FIN-WAIT-2, CLOSING or CLOSE-WAIT states results in the transmission of a RST, and the socket enters CLOSED state [RFC-793]. The consequences of user abort in FIN-WAIT- 1, FIN-WAIT-2 and CLOSW-WAIT are similar to the previous section; an entire window may be in transit when the RST is transmitted, if there is data in transfer in the opposite direction to that folllowed by the FIN. In CLOSING state, the FIN, and all data segments, have been received by the peer before it transmits the RST, and no non- duplicate data segments are in the network. In this case the danger reduces to that of old duplicate segments, as in a conventionally Heavens [Page 10] Internet Draft RSTs Considered Harmful June 1996 closed TCP connection. Data received after Half Duplex Close A host may implement a half-duplex TCP close, where an application that has called CLOSE cannot continue to read data from the connec- tion [RFC-1122]. Subsequent arrival of data elicits a RST. RFC-1122 does not explicitly state whether the connection enters CLOSED state. In this section the assumption is made that it does. Appendix A shows the results if this assumption is invalid. The danger of acceptance of old segments still exists in the latter case. It is straightforward to demonstrate this scenario. Berkeley UNIX implementations of FTP [RFC-959] abort transfers in this fashion when the receiver cannot write out the file to disk, because the disk is full or because the file is too large. Figure 5 shows this scenario. TCP A is a 80386 running Interactive UNIX with SpiderTCP, and TCP B is a Sparcstation running SunOS 4.1.3. An FTP client is started from TCP A and the 'get' command used to download a file from TCP B. TCP A aborts the connection because the file limit is reached. The FTP control connection is closed first and then the data connection. Further data arrives from TCP B. Since this arrives in FIN-WAIT-2, and BSD TCP/IP implements half duplex close, it elicits a RST from TCP A [RFC-1122], and TIME-WAIT state is bypassed. Note that figure 5 shows only the FTP data connection, not the control connection. TCP A TCP B 1. ESTABL. <-- <-- ESTABL. 2. ESTABL. --> --> ESTABL. (File Too Large: Close) 3. FIN-WAIT-1 --> --> CLOSE-WAIT 4. FIN-WAIT-2 <-- <-- CLOSE-WAIT 5. FIN-WAIT-2 <-- <-- CLOSE-WAIT 6. CLOSED --> --> CLOSED Figure 5. Data Received after Half Duplex Close If the ACK in line 4 is delayed or lost, TCP A is still in FIN-WAIT-1 in line 5, when the data arrives. A RST is transmitted and there is a state transition to CLOSED, as above. For both these scenarions, the danger of acceptance by a subsequent incarnation of the connec- tion occurs in identical fashion to Figure 2. Heavens [Page 11] Internet Draft RSTs Considered Harmful June 1996 2.3 Proof by Demonstration The hazards described in this memo could be shown with the testbed used to demonstrate the hazards of TIME-WAIT assassination in [RFC- 1337]. This might involve a client application acting as a data source, and a server which, on receipt of the first data segment, transmits a RST and closes the connection. Repetition of this over a long period should cause the server to accept an old segment from a previous incarnation as described in Figure 2 above. No duplication of segments is required within the testbed, unlike demonstration of TIME-WAIT Assassination. 2.4 Other Hazards Two other hazards exist as a result of RST-terminated connections; a de-synchronised connection as a result of an old ACK that is accept- able but acknowledges something not yet sent, and connection failure, also as a result of receiving an old ACK. The ACKs, like data, need not be duplicate segments. [RFC-1337] shows how these two hazards, referred to as H2 and H3, occur; this memo concentrates on examples of the hazard, referred to as H1 in [RFC-1337], of erroneous accep- tance of old segments containing data. 2.5 Relative Probabilities Although RSTs are less common than FINs as a means of closing connec- tion, the likelihood of data arriving after closedown is higher. Appendix B derives a ratio of probability based on observed traffic statistics. Though an informal analysis, it implies that there is a significant risk in using RSTs to close connections. Heavens [Page 12] Internet Draft RSTs Considered Harmful June 1996 3. TIME-WAIT after RST Transmission One solution to the dangers presented in the previous section involves the extension of the TIME-WAIT state to RST-terminated con- nections. This turns out to offer only partial protection against data corruption. TIME-WAIT state must be entered by the TCP endpoint that sends the RST; if the receiver enters TIME-WAIT, loss of the RST means that there is no TIME-WAIT state and the risk of data corruption still exists. A connection in any of SYN-RECVD, ESTABLISHED, FIN-WAIT-1, FIN-WAIT- 2, CLOSING and CLOSE-WAIT states enters TIME-WAIT state on transmis- sion of a RST, rather than CLOSED. Reception of a RST causes a tran- sition to CLOSED as in [RFC-793]. Minor modifications to the seman- tics of TIME-WAIT are required: if entered after RST transmission, reception of all further valid non-RST segments elicits a RST, rather than an ACK, and the TIME-WAIT timer is restarted. Received RSTs are ignored in TIME-WAIT, as proposed by fix F1 in [RFC-1337]. Heavens [Page 13] Internet Draft RSTs Considered Harmful June 1996 3.1 User Abort with TIME-WAIT This solution is shown in Figure 6 for the case of User Abort in ESTABLISHED state. The hazards outlined in Figures 2 and 3 are less likely to occur. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- TIME-WAIT 4. ESTABL. --> ... 5. ESTABL. <-- ... 6. ESTABL. --> ... 7. ESTABL. --> ... 8. CLOSED <-- ... 9. ... --> TIME-WAIT 10. CLOSED <-- <-- TIME-WAIT 11. ... --> TIME-WAIT 12. CLOSED <-- <-- TIME-WAIT 13. ... --> TIME-WAIT 14. CLOSED <-- <-- TIME-WAIT 15. (2 MSL) CLOSED Figure 6. Connection Closed by User Abort Heavens [Page 14] Internet Draft RSTs Considered Harmful June 1996 The solution outlined above offers partial protection against data corruption hazards arising from RST-terminated connections. However, delay or loss of a RST gives rise to a potential hazard. For TIME-WAIT state to provide full protection, it must commence after both ends of a connection have stopped transmitting data. This is guaranteed for the peer that enters TIME-WAIT, since it has transmitted a RST and no data can follow this. The transition to TIME-WAIT must also take place after the other peer has ceased data transmission. The 3-way closing handshake enforces this for conven- tionally closed connections; TIME-WAIT state is always entered after the CLOSE-WAIT to LAST-ACK transition at the last peer to transmit data. The lack of an equivalent mechanism for RST-terminated connections leads to situations where the effective TIME-WAIT state is truncated or vanishes completely. 3.2 RST Loss and Data Retransmission Figure 7 shows a scenario where TCP A is retransmitting data seg- ments, lost because of network congestion. Owing to exponential backoff, as described in [RFC-1122], the interval between successive retransmissions is now the 60 second limit common to many TCP imple- mentations. TCP B gives up and aborts the connection, entering TIME-WAIT state as mandated by the partial solution in chapter 3. The ensuing RST is lost, as the network is still congested. TCP A continues to retransmit. At some point network congestion eases, and a retransmitted data segment reaches TCP B. A new incarnation of the connection may be in existence, and the data segment may be errone- ously accepted. Heavens [Page 15] Internet Draft RSTs Considered Harmful June 1996 TCP A TCP B 1. ESTABL. --> ESTABL. (lost) (User Abort) 2. ... <-- TIME-WAIT (lost) (RTX after 60 seconds) 3. ESTABL. --> TIME-WAIT (lost) (RTX after 60 seconds) 4. ESTABL. --> TIME-WAIT (lost) (RTX after 60 seconds) 5. ESTABL. --> TIME-WAIT (lost) (RTX after 60 seconds) 6. ESTABL. --> TIME-WAIT (lost) (2 MSL) 7. CLOSED (RTX after 60 seconds) 8. ESTABL. --> ... Figure 7. RST Loss and Data Retransmission 3.3 RST Loss and Idle Connections It is not necessary for data transmission to be in progress for the above hazard to occur. Consider the case where the user aborts an idle connection, as shown in Figure 8. TCB B issues the abort, and enters TIME-WAIT. The RST is lost, so that TCP A remains in ESTA- BLISHED state. No activity occurs until TCP A tries to transmit data, an interval that is unbounded, and so may exceed twice the MSL. The data segment may be erroneously accepted at TCP B by a subsequent incarnation of the connection. Heavens [Page 16] Internet Draft RSTs Considered Harmful June 1996 TCP A TCP B 1. ESTABL. ESTABL. (User Abort) 2. ... <-- TIME-WAIT (lost) (2 MSL) 3. CLOSED (Interval > 2MSL) 4. ESTABL. --> ... Figure 8. RST Loss and Idle Connections Heavens [Page 17] Internet Draft RSTs Considered Harmful June 1996 Security Considerations Security issues are not discussed in this memo. References [Congestion] V. Jacobson, "Congestion Avoidance and Control," ACM SIGCOMM-88, August 1988. [RFC-792] J. Postel, "Internet Control Message Protocol", RFC-792, USC/Information Sciences Institute, September 1981. [RFC-793] Postel, J., "Transmission Control Protocol", RFC-793, USC/Information Sciences Institute, September 1981. [RFC-959] J. Postel, J. Reynolds, "File Transfer Protocol", RFC-959, ISI, October 1985. [RFC-1122] R. Braden, "Requirements for Internet hosts - communication layers", October 1989. [RFC-1185] Jacobson, V., Braden, R., and Zhang, L., "TCP Extension for High- Speed Paths", RFC-1185, Lawrence Berkeley Labs, USC/Information Sciences Institute, and Xerox Palo Alto Research Center, October 1990. [RFC-1191] J. Mogul, S. Deering, "Path MTU Discovery", RFC-1191, November 1990. [RFC-1323] Jacobson, V., Braden, R. and D. Borman "TCP Extensions for High Performance", RFC-1323, Lawrence Berkeley Labs, USC/Information Sciences Institute, and Cray Research, May 1992. [RFC-1337] R. Braden, "TIME-WAIT Assassination Hazards in TCP", RFC-1337, ISI, May 1992. [TCP/IP-Illustrated] Gary Wright & Richard Stevens, "TCP/IP Illustrated, Volume 2", Addison-Wesley 1995. Heavens [Page 18] Internet Draft RSTs Considered Harmful June 1996 Acknowledgements Thanks to Alan Cox and Jon Crowcroft for their comments on previous expanded versions of this memo, and to Bob Braden for [RFC-1337], which stimulated ideas leading to it. Author's Address: Ian Heavens Fore Systems Inc. 2475 The Crescent, Solihull Parkway Birmingham Business Park B37 7YE United Kingdom Phone: +44 (0)121 717 4444 Fax: +44 (0)121 717 4455 Email: iheavens@fore.co.uk Heavens [Page 19] Internet Draft RSTs Considered Harmful June 1996 4. Appendix A: A Different Interpretation of RFC-1122 There are problems with interpreting [RFC-1122] to respond to the arrival of data after half duplex close with a RST and no state change. The connection hangs if data arrives at TCP A in FIN-WAIT-2, as Figure 9 shows. TCP A TCP B 1. ESTABLISHED ESTABLISHED (Close) 2. FIN-WAIT-1 --> --> CLOSE-WAIT 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 4. FIN-WAIT-2 <-- <-- CLOSE-WAIT (user data after half duplex close) 5. FIN-WAIT-2 --> --> CLOSED Figure 9. Data Received in FIN-WAIT-2 after Half Duplex Close If the ACK of the FIN is lost or delayed, and data arrives in FIN- WAIT-1, the connection terminates without entering TIME-WAIT state. This is shown in Figure 10. TCP A TCP B 1. ESTABLISHED ESTABLISHED (Close) 2. FIN-WAIT-1 --> --> CLOSE-WAIT 3. (lost) ... <-- CLOSE-WAIT 4. FIN-WAIT-1 <-- <-- CLOSE-WAIT (user data after half duplex close) 5. FIN-WAIT-1 --> --> CLOSED 6. FIN-WAIT-1 --> --> CLOSED 7. CLOSED <-- <-- CLOSED Figure 10. Data Received in FIN-WAIT-1 after Half Duplex Close Heavens [Page 20] Internet Draft RSTs Considered Harmful June 1996 5. Appendix B : Relative Probabilities of Hazards 5.1 Introduction This section contains a less than rigorous analysis of the relative probabilities of the various data corruption hazards. Note that these probabilities are zero for TCP connections operating below 250 kbytes/second; the initial sequence number selection protects against data corruption hazards, regardless of the mechanism for closing the connection. 5.2 FIN, RST, Timer and ICMP Related Hazards It is useful to compare the relative probabilities of hazards arising from FIN-, RST-, Timer- and ICMP-terminated TCP connections. The probability of each hazard is proportional to the amount of data received after transition to CLOSED. Complete protection requires that this be guaranteed to be zero. Data received after connection closure does not cause data corruption, unless it falls within the current window of a new incarnation of the connection. It is assumed that the connection peer displaying the hazard is act- ing as a data sink, maximising the data received and the probability of failure. If the proportion of TCP connections acting as data sinks or data sources is the same regardless of how the connection terminates, the relative probabilities remain the same. To simplify the arithmetic, higher order effects are ignored; for instance, those arising from the loss of more than one TCP segment in the period considered. The three hazards considered are data corruption arising from the following: o Hazard 1: A FIN-terminated TCP connection with TIME-WAIT state omitted. o Hazard 2: A TCP connection aborted from Established state, with neither TIME-WAIT nor LAST-ACK states. o Hazard 3: A TCP connection aborted from Established state, with TIME-WAIT but without LAST-ACK state. Other hazards, such as connections aborted during closedown, by timeouts, or ICMP messages, are ignored. These are much less likely than Hazard 2. The duration of closedown is typically much shorter than that of Established state. Timeouts require multiple loss of Heavens [Page 21] Internet Draft RSTs Considered Harmful June 1996 segments in the network and represent higher order effects, with correspondingly lower probabilities. ICMP termination of synchron- ised connections is very rare. Nomenclature P1 - Relative probability of Hazard 1 P2 - Relative probability of Hazard 2 PL - Probability of loss of a TCP segment in the network PR - Probability that a TCP connection terminates by RST PT - Probability that a TCP connection terminates by timeout PI - Probability that a TCP connection terminates by ICMP message MSS - Maximum Segment Size W - Maximum offered TCP window o Hazard 1 Duplicate segments received after FIN-terminated connections usu- ally arise because of the loss of an ACK, triggering an unneces- sary retransmission. Slow start [Congestion] implies that only one segment will be retransmitted without acknowledgement. The relative probability of H1 is the segment size multiplied by the probability of segment loss and the probability of termination by FIN handshake: P1 = MSS * PL * (1 - PR - PT - PI) = MSS * PL ignoring higher order effects. o Hazard 2 For a data sink, transmission of a RST in Established state and transition to CLOSED state is followed by reception of up to a window of data, all of which may be received during a subsequent incarnation of the connection. The relative probability of H2 is the window size multiplied by the probability of termination by RST: P2 = W * PR o Hazard 3 In this case, a RST is lost. Any data received in TIME-WAIT causes the TIME-WAIT timer to restart, so the hazard only occurs if the gap between reception of segments exceeds the duration of Heavens [Page 22] Internet Draft RSTs Considered Harmful June 1996 TIME-WAIT state. This occurs if several retransmitted segments are lost, which is a higher order effect with low probability, or if an application spontaneously transmits data after this time, which is also unlikely. This hazard can be ignored. 5.3 Relative Probabilities of FIN- and RST-related Hazards The ratio of probabilities of hazard H2 and H1 is P2/P1 = W/MSS * PR/PL Example Calculation If Path MTU Discovery [RFC-1191] is supported, the segment size is the Maximum Segment Size indicated by the lowest physical packet size on the connection path, unless negotiated to be lower during connec- tion establishment. Implementation of [RFC-1191] is not yet widespread, so the default figure is assumed [RFC-1122, 3.3]. TCP segment size = 576 - size of TCP and IP headers = 536 Assume a window size of 32K. Appendix C summarises statistics about TCP connections, derived from a variety of connections. Taking the average percentage values of PR=1.1 and PL=1.2 derived from Appendix C: P2/P1 = W/MSS * PR/PL = 32768/536 * 1.1/1.2 = 56. For TCP connections on the same physical network, or where Path MTU Discovery is supported, the default segment size is larger and rela- tive probability smaller. The lowest ratio consistent with the data in Appendix C can be calcu- lated from the highest value of PL (2.9) and the lowest value of PR (0.8): P2/P1 = 17. It can be concluded that erroneous acceptance of data from expired connections is significantly more likely to occur as a result of RST-terminated connections than the equivalent hazard after FIN- terminated connections. Heavens [Page 23] Internet Draft RSTs Considered Harmful June 1996 6. Appendix C:Traffic Statistics for TCP Connections Statistics were measured using the netstat program on six machines: [1] A home workstation (VMS) used for telecommuting via a 56Kb Frame Relay link to the Internet. [2] A DNS and mail gateway (VMS) at the University of Tucson, Arizona. [3] A personal workstation (SunOS 4.1.3) on Spider Systems' (now Shiva Corporation) corporate LAN. [4] The BSD development system (BSD4.4-Lite) at the Computer Science department, Berkeley, California (taken from [TCP/IP-Illustrated], p.799). [5] A file server (SunOS 4.1.3) on Spider Systems' corporate LAN. [6] An application gateway (SunOS 4.1.3) between Spider Systems' cor- porate LAN and the Internet. The columns show statistics collected by the BSD netstat utility or its VMS equivalent, with the exception of machine uptime. The derivation of the statistics from the BSD TCP/IP "tcpstat" structure is shown in parentheses. o machine (M) o time in days that the machine has been up (U) o number of TCP connections established (tcpstat.tcp_connects). o number of TCP connections aborted by RST transmission, expressed as a sum of the total aborted excluding those aborted by reception of data after half duplex close, and those aborted after half duplex close ((tcpstat.tcps_drops - tcpstat.tcps_rcvafterclose) + tcpstat.tcps_rcvafterclose). o number of TCP connections timed out expressed as a sum of the number timed out by retransmissions and keepalives (tcpstat.tcps_timeoutdrop + tcpstat.tcps_keeptimeo). o total number of TCP data segments transmitted, excluding retransmissions (tcpstat.tcps_sndpack - tcpstat.tcps_sndrexmitpack). o total number of TCP data segments retransmitted Heavens [Page 24] Internet Draft RSTs Considered Harmful June 1996 (tcpstat.tcps_sndrexmitpack). M U Establ. Dropped Timed Out TXed Segs RTXed Segs. 1 2 408 4+1 263+1 135168 250 2 5 46632 456+102 7338+551 317523 4756 3 ? 138682 13349+3686 79+2345 22761633 104440 4 30 126820 44+1017 86+3219 8920528 257295 5 20 13557 198+205 43+28 1559505 1675 6 14 48226 3943+1396 11+190 11505576 67401 Percentage values for aborted and timed out connections, and for seg- ment loss, are as follows. Machine Dropped (%) Timed Out (%) Retransmissions (%) 1 1.2 64.7 0.18 2 1.2 16.9 1.50 3 12.3 1.75 0.46 4 0.8 2.60 2.88 5 3.0 0.52 0.11 6 11.1 0.42 0.59 Machine 3 and 5 are internal to a LAN and mostly handle NFS traffic, so may be expected to have different patterns of connection estab- lishment and segment losses. Dropped connections for machine 6 are such a high proportion that some pathological system or application problem can be suspected. These machines are excluded from calcula- tions. Aborted connections yield more consistent percentages than timeouts and segment loss rates; this may be because the latter are more sus- ceptible to the characteristics of nearby networks, whereas aborts are a function of application or system behaviour. For instance, an excessive proportion of machine 1's TCP connections expire because of retransmission timeouts; this may be due to an unreliable link. For machines 1, 2 and 4, the average percentage drop rate is 1.1%. The average retransmission rate is 1.2%. The lowest percentage drop rate is 0.8%, and the highest retransmission rate is 2.9%. Heavens [Page 25]