Internet Draft Ian Heavens Expires Jan 15, 1996 Spider Systems July 1995 Problems with TCP Connections Terminated by RSTs or Timers draft-heavens-problems-rsts-00.txt Status of this Memo This memo is being distributed to members of the Internet community in order to solicit their reactions to the proposals contained in it. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Abstract This memo argues that the danger of segments from old TCP connections occurs for connections terminated by RST segments, timers, or ICMP messages, as well as those terminated by FIN segments. To avoid this danger, RST terminated connections require a 2 way closing handshake, with the recipient of the RST entering TIME-WAIT and acknowledging the RST. LAST-ACK is used as the interim state between transmission of a RST and receiving its acknowledgement, at which point CLOSED is entered. This solution provides protection even when interoperating with a non-conformant implementation. The probability of data corruption is shown to exceed that of the equivalent danger in FIN terminated connections. To maintain backwards compatibility, a TCP host or connection may be configured to revert to [RFC-793] behaviour. Heavens [Page 1] Internet Draft Problems with RSTs and Timers July 1995 Table of Contents 1. Introduction 1.1 Overview 1.2 Background 1.3 RST-Terminated Connections 2. Old Segment Acceptance from RST-Terminated Connections 2.1 RST-Terminated Connections from Established State 2.2 RST-Terminated Connections during Closedown 2.3 Proof by Demonstration 2.4 Other Hazards 3. A Partial Solution: TIME-WAIT after RST transmission 3.1 User Abort 3.2 Half Duplex Close 3.3 Crossing RSTs 4. RST Loss Hazards 4.1 RST Loss and Data Retransmission 4.2 RST Loss and Idle Connections 5. A Complete Solution: the 2-way Closing Handshake 5.1 Discussion 5.2 Interim State 5.3 Changes to RFC-793 State Machine 5.4 Interoperability with [RFC-793] Implementations 5.6 Timer-Terminated Connections 5.7 Connections Terminated by ICMP Messages 5.8 TP4 6. Relative Probabilities of Hazards 6.1 FIN, RST, Timer and ICMP Related Hazards 6.2 Relative Probabilities of FIN- and RST-related Hazards 7. Implications for Related TCP Standards 7.1 TIME-WAIT Assassination 7.2 High Performance Extensions 7.3 Extensions for Transactions Heavens [Page 2] Internet Draft Problems with RSTs and Timers July 1995 8. Issues of Backwards Compatibility 8.1 Introduction 8.2 Nomenclature 8,3 Resources 8.4 Resource Starvation 8.5 Approaches to Configuration 8.6 API Semantics 8.7 Interoperability 8.8 Simplicity 9. Solution with Backwards Compatibility 9.1 Introduction 9.2 Solution 9.3 Configuration 9.4 API Semantics 9.5 Interoperability with RFC-793 Appendix A: TCP Connection State Diagram (Partial Solution) Appendix B: TCP Connection State Diagram (Full Solution) Appendix C: A Different Interpretation of RFC-1122 Appendix D: Modifications to RFC-793 Appendix E: Modifications to RFC-1122 Appendix F: Modifications to RFC-1213 Appendix G: Traffic Statistics for TCP Connections Heavens [Page 3] Internet Draft Problems with RSTs and Timers July 1995 Glossary o API Application Programming Interface o MSL Maximum Segment Lifetime o FIN-Terminated Connection A synchronised TCP connection which terminates by the 3-way handshake, involving the exchange and reliable acknowledgement of FIN segments. o RST-Terminated Connection A synchronised TCP connection which terminates by transmission or reception of a RST. o Timer-Terminated Connection A synchronised TCP connection which terminates by a timeout. o Hard Abort A RST-terminated connection where the peer that transmits a RST enters CLOSED state. o Soft Abort A RST-terminated connection where the peer that transmits a RST enters LAST-ACK state, and the peer enters TIME-WAIT state when it receives the RST. o Hard Timeout A timer-terminated connection where the peer that times out immediately enters CLOSED state, possibly transmitting a RST. o Soft Timeout A timer-terminated connection where the peer that times out transmits a RST, enters LAST-ACK state, and the peer enters TIME-WAIT state if it receives the RST. Heavens [Page 4] Internet Draft Problems with RSTs and Timers July 1995 1. Introduction 1.1 Overview Chapter 1 describes mechanisms for closing TCP connections, and the signficance of the TIME-WAIT state. Chapter 2 identifies a series of connection terminations involving RSTs that may lead to data corruption. Chapter 3 shows how the use of TIME-WAIT state alone can provide some protection against this. Chapter 4 identifies scenarios where this solution is insufficient. Chapter 5 describes a complete solution involving a 2-way closing handshake. Chapter 6 examines the relative probabilities of data corruption hazards after FIN- and RST-terminated connections. Chapter 7 looks at the implications for related TCP standards such as high speed extensions. Chapter 8 discusses backwards compatibility. Chapter 9 proposes a solution that guarantees backwards compatibil- ity. 1.2 Background FINs, RSTs, Timers and ICMP Messages There are four mechanisms available in [RFC-793] to close a TCP con- nection: FINs, RSTs, Timers and ICMP messages. FINs may be used to close down a connection in an orderly fashion, guaranteeing reliable delivery of all data segments transmitted before the FIN in both directions. The requirement to reliably ack- nowledge FINs in both directions leads to a number of half-closed states: FIN-WAIT-1, FIN-WAIT-2, CLOSING, CLOSE-WAIT, LAST-ACK and TIME-WAIT. A RST closes a connection abruptly, immediately removing connection state on transmission or reception. There are no interim states; transition is to CLOSED on transmission or reception of a RST. Timeouts also close a connection abruptly; a connection that times Heavens [Page 5] Internet Draft Problems with RSTs and Timers July 1995 out may optionally transmit a RST, or it may assume that the peer has disappeared by virtue of the lack of response over a significant period of time. Timeouts also cause an immediate transition to CLOSED. ICMP messages do not usually terminate a synchronised connection, but there are circumstances in which it occurs. One ICMP message, Frag- mentation Required but DF set [RFC-792 and RFC-1122], always ter- minates synchronised connections. In the same way as connections terminated by RST or timeout, there is an immediate state transition to CLOSED. TIME-WAIT The TIME-WAIT state has two functions in the TCP protocol. The first is asymmetric: to ensure the reliable acknowledgement of FINs transmitted in CLOSE-WAIT state and so the completion of the 3-way closing handshake. The second is symmetric: to ensure that all TCP segments, generated in either direction during the lifetime of the connection, have drained from the network before initiation of a new incarnation of the connection. The clock based ISN protects slow con- nections against this threat [RFC-793]. For fast connections, TIME- WAIT prevents the acceptance of old duplicate segments by a new incarnation utilising identical port numbers. The relative threats are explained in the Appendix of [RFC-1185], and in section 1.2 of [RFC-1323]. The problem is summarised in relation to the associated danger of premature termination of TIME-WAIT state (TIME-WAIT assas- sination) in [RFC-1337]. No equivalent mechanism to TIME-WAIT exists for connections ter- minated by transmission of a RST segment. Although RST transmission is omitted from the TCP Connection State Diagram, the text of [RFC- 793] clearly states that where the transmission of a RST results in a state change, it is to CLOSED state. Similarly, reception of a RST causes a state change to CLOSED. 1.3 RST-Terminated Connections There are several ways in which previously synchronised connections are terminated by RST transmission. These include User Abort [RFC- 793] and reception of data after half-duplex close [RFC-1122]. How- ever, not all RSTs result in connection termination. Reception of a SYN segment addressed to a port for which there is no listening socket results in transmission of a RST. This is associated with no connection and is equivalent to an ICMP Port Unreachable. The origi- nator of the SYN changes state from SYN-SENT to CLOSED on reception of the RST, and the connection is never synchronised. Other connec- tions in non-synchronised states respond to an unacceptable ACK, Heavens [Page 6] Internet Draft Problems with RSTs and Timers July 1995 security or precedence mismatch by transmitting a RST. In all these cases, no connection has been synchronised nor data sent, so that there is no danger of old data segments being accepted by subsequent incarnations of the connection. This memo distinguishes those synchronised connections which ter- minate by transmission or reception of a RST by referring to them as "RST-terminated connections". Heavens [Page 7] Internet Draft Problems with RSTs and Timers July 1995 2. Old Segment Acceptance from RST-Terminated Connections Several scenarios result in the spurious acceptance of old segments from RST-terminated connections. Two types of examples are given here; connections aborted in Established state, and connections aborted during the 3-way closing handshake. 2.1 RST-Terminated Connections from Established State There are two instances of RST-terminated connections from Esta- blished state which involve the hazard of old data acceptance by a subsequent incarnation of the connection. The first is a User Abort issued in Established state; the second a half-duplex close with unread data [RFC-1122, p.88]. The sequence of events in both case is identical: a RST is sent by the socket from Established state, as a result of an abort, or a close with pending unread data. In the worst failure mode, the socket issuing the abort is acting as a data sink. In this case a window full of data segments may be in transit when the RST is received at the data source. Any of these segments - which are not duplicates - may corrupt a subsequent incar- nation of the connection. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- CLOSED 4. ESTABL. --> ... 5. ESTABL. <-- ... 6. ESTABL. --> ... 7. ESTABL. --> ... 8. CLOSED <-- ... Figure 1. Connection closed by User Abort This is shown in Figure 1. TCP A is the data source and TCP B is the data sink. Line 1 shows a normal data segment from TCP A. An ACK Heavens [Page 8] Internet Draft Problems with RSTs and Timers July 1995 segment is transmitted by TCP B on line 2. TCP B user issues an abort, transmits a RST, and enters CLOSED state on line 3, as speci- fied in [RFC-793]. Normal data continues to be transmitted by TCP A on line 4. Line 5 shows the arrival at TCP A of the ACK generated on line 2. This may open the window and elicit further segments from TCP A on lines 6 and 7, until the arrival of the RST at TCP A on line 8. At this point TCP A enters CLOSED state, and three data segments from TCP A are in transit to TCP B. The connection is reopened by the 3-way SYN handshake. Assume that the clock based ISN chosen by TCP A for the new connection has been overrun by the sequence number consumption in the previous incarna- tion of the connection. The sequence numbers occupied by the last three segments transmitted by TCP A during the previous incarnation may overlap the window offered by TCP B in the current incarnation of the connection. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ESTABL. <-- <-- ESTABL. 3. (old segment)... --> ESTABL. 4. ESTABL. <-- <-- ESTABL. 5. ESTABL. --> --> ESTABL. 6. ... <-- ESTABL. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7a. ESTABL. --> --> ESTABL. 8a. ESTABL. <-- ... 9a. ESTABL. --> --> ESTABL. 10a ESTABL. <-- <-- ESTABL. Figure 2: Accepting One Old Segment Figure 2 shows the spurious acceptance of part of a segment from the previous incarnation of the connection. Line 1 shows a normal data segment from TCP A after the SYN handshake has been completed. Line 2 shows the ACK of this segment, and line 3 shows the arrival of an old segment from the previous connection. It falls within TCP B's Heavens [Page 9] Internet Draft Problems with RSTs and Timers July 1995 current window and is queued in the TCP reassembly queue, as its sequence number exceeds the next expected sequence number. Since there is a missing segment, the next ACK in line 4 acknowledges the previous bona fide segment, and TCP A does not detect acknowledgement of unsent data. The next bona fide segment from the current connec- tion arrives at TCP B in line 5. At this point, part or all of the old segment is delivered to the user of TCP B, depending upon the implementation of the reassembly algorithm. This behaviour is described in [RFC-1337]. TCP B transmits the acknowledgement of the two previous segments in line 6. TCP A transmits another segment on line 7a before the arrival of the acknowledgement in line 8a, and assumes that it is a partial acknowledgement of this segment. Segment transmission and ack- nowledgement continue as usual on lines 9a and 10a. Neither TCP A nor TCP B are aware of the spurious acceptance of old data by TCP B. To underscore the possibility of the erroneous acceptance of several old segments, Figure 3 shows the acceptance of two such segments. The exchange is identical to Figure 2 until 7a, when a second old segment from TCP A arrives at TCP B. Since TCP B has queued the first old segment from TCP A, it delivers the entire second old seg- ment to the user. TCP B transmits the acknowledgement on line 7b. Line 8a and subsequent lines show the arrival of the acknowledgements of spurious segments and the transmission of further segments by TCP A. The acknowledgements are accepted as valid, since TCP A has already transmitted past the sequence number acknowledged in the last ACK from TCP B. Heavens [Page 10] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ESTABL. <-- <-- ESTABL. 3. (old segment)... --> ESTABL. 4. ESTABL. <-- <-- ESTABL. 5. ESTABL. --> --> ESTABL. 6. ... <-- ESTABL. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7a. (old segment)... --> ESTABL. 7b. ... <-- ESTABL. 7c. ESTABL. --> --> ESTABL. 7d. ... <-- ESTABL. 8a. ESTABL. <-- ... 9a. ESTABL. --> --> ESTABL. 9b. ESTABL. <-- ... 9c. ESTABL. <-- ... 10a ESTABL. <-- <-- ESTABL. Figure 3: Accepting Two Old Segments These examples may be generalised to illustrate the arrival and acceptance of a window of old segments at TCP B. It is also possible for old segments to persist in the case where a user abort is issued on the socket acting as a data source. This happens in the scenario where the ensuing RST arrives before one or more of the data segments previously transmitted. This is shown in Figure 4. Heavens [Page 11] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ESTABL. <-- <-- ESTABL. 3. ESTABL. --> ... 4. ESTABL. --> ... 5. ESTABL. --> ... (User Abort) 6. CLOSED --> 7. ... --> CLOSED 8. --> 9. --> 10. --> Figure 4. User Abort and RST Reordering The acceptance of old segments in transit on lines 8, 9 and 10 occurs in an identical fashion to the previous example, as shown in Figures 2 and 3. 2.2 RST-Terminated Connections during Closedown RST-terminated connections also occur from states other than Esta- blished, during the 3-way closing handshake. Two examples are User Abort [RFC-793] and Half Duplex Close [RFC-1122]. User Abort during Closedown A user abort issued in FIN-WAIT-1, FIN-WAIT-2, CLOSING and CLOSE-WAIT states results in the transmission of a RST, and the socket enters CLOSED state [RFC-793]. The consequences of user abort in FIN-WAIT-1 and FIN-WAIT-2 are similar to the previous section; an entire window may be in transit when the RST is transmitted. In CLOSING and CLOSE-WAIT, the FIN, and all data segments, have been received by the peer before it transmits the RST, and no non-duplicate data segments are in the network. In this case the danger reduces to that of old duplicate segments, as in a conventionally closed TCP connection. Heavens [Page 12] Internet Draft Problems with RSTs and Timers July 1995 Data received after Half Duplex Close A host may implement a half-duplex TCP close, where an application that has called CLOSE cannot continue to read data from the connec- tion [RFC-1122]. Subsequent arrival of data elicits a RST. RFC-1122 does not explicitly state whether the connection enters CLOSED state. In this section the assumption is made that it does. Appendix C shows the results if this assumption is invalid. The danger of acceptance of old segments still exists in the latter case. It is straightforward to demonstrate this scenario. Berkeley UNIX implementations of FTP [RFC-959] abort transfers in this fashion when the receiver cannot write out the file to disk, because the disk is full or because the file is too large. Figure 5 shows this scenario. TCP B is sending the file to TCP A, which closes prematurely because the disk is full. TCP B transmits a data segment with the contents "You could at least say goodbye!". Since this arrives in FIN-WAIT-2, and BSD TCP/IP implements half duplex close, it elicits a RST [RFC- 1122] from TCP A. TCP A TCP B 1. ESTABL. <-- <-- ESTABL. 2. ESTABL. --> --> ESTABL. (Disk Full: Close) 3. FIN-WAIT-1 --> --> CLOSE-WAIT 4. FIN-WAIT-2 <-- <-- CLOSE-WAIT 5. FIN-WAIT-2 <-- <-- CLOSE-WAIT "You could at least say goodbye!" 6. CLOSED --> --> CLOSED Figure 5. Data Received after Half Duplex Close If the ACK in line 4 is delayed or lost, TCP A is still in FIN-WAIT-1 in line 5, when the data arrives. A RST is transmitted and there is a state transition to CLOSED, as above. The danger of acceptance by a subsequent incarnation of the connection occurs in identical fashion to Figure 2. Heavens [Page 13] Internet Draft Problems with RSTs and Timers July 1995 2.3 Proof by Demonstration Proof by demonstration could be carried out with the testbed used to demonstrate the hazards of TIME-WAIT assassination in [RFC-1337]. This might involve a client application acting as a data source, and a server which, on receipt of the first data segment, transmits a RST and closes the connection. Repetition of this over a long period should cause the server to accept an old segment from a previous incarnation as described in Figure 2 above. No duplication of seg- ments is required within the testbed, unlike demonstration of TIME- WAIT Assassination. 2.4 Other Hazards Two other hazards exist as a result of RST-terminated connections; a de-synchronised connection as a result of an old ACK that is accept- able but acknowledges something not yet sent, and connection failure, also as a result of receiving an old ACK. The ACKs, like data, need not be duplicate segments. [RFC-1337] shows how these two hazards, referred to as H2 and H3, occur; this memo concentrates on examples of the hazard, referred to as H1 in [RFC-1337], of erroneous accep- tance of old segments containing data. Heavens [Page 14] Internet Draft Problems with RSTs and Timers July 1995 3. A Partial Solution: TIME-WAIT after RST transmission One solution to the dangers presented in the previous section involves the extension of the TIME-WAIT state to RST-terminated con- nections. This turns out to offer only partial protection against data corruption. A connection in any of SYN-RECVD, ESTABLISHED, FIN-WAIT-1, FIN-WAIT- 2, CLOSING and CLOSE-WAIT states enter TIME-WAIT state on transmis- sion of a RST, rather than CLOSED. Reception of a RST causes a tran- sition to CLOSED as in [RFC-793]. Minor modifications to the seman- tics of TIME-WAIT are required: if entered after RST transmission, reception of all further valid non-RST segments elicits a RST, rather than an ACK, and the TIME-WAIT timer is restarted. Received RSTs are ignored in TIME-WAIT, as proposed by fix F1 in [RFC-1337]. Appendix A shows extensions to the Connection State Diagram in [RFC- 793] to show state changes on RST transmission. Heavens [Page 15] Internet Draft Problems with RSTs and Timers July 1995 3.1 User Abort This solution is shown in Figure 6 for the case of User Abort in ESTABLISHED state. The hazards outlined in Figures 2 and 3 are less likely to occur, though not impossible, as the next chapter indi- cates. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- TIME-WAIT 4. ESTABL. --> ... 5. ESTABL. <-- ... 6. ESTABL. --> ... 7. ESTABL. --> ... 8. CLOSED <-- ... 9. ... --> TIME-WAIT 10. CLOSED <-- <-- TIME-WAIT 11. ... --> TIME-WAIT 12. CLOSED <-- <-- TIME-WAIT 13. ... --> TIME-WAIT 14. CLOSED <-- <-- TIME-WAIT 15. (2 MSL) CLOSED Figure 6. Connection Closed by User Abort Heavens [Page 16] Internet Draft Problems with RSTs and Timers July 1995 3.2 Half Duplex Close Figure 7 and 8 show modifications to include TIME-WAIT for data received after half duplex close. TCP A TCP B 1. ESTABLISHED ESTABLISHED (Close) 2. FIN-WAIT-1 --> --> CLOSE-WAIT 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 4. FIN-WAIT-2 <-- <-- CLOSE-WAIT 5. TIME-WAIT --> --> CLOSED (2 MSL) 6. CLOSED Figure 7. Data Received after Half Duplex Close In Figure 7, TCP B does not transmit a FIN and the state transition is from CLOSED-WAIT to CLOSED on RST reception. Heavens [Page 17] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABLISHED ESTABLISHED (Close) 2. FIN-WAIT-1 --> --> CLOSE-WAIT 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 4. FIN-WAIT-2 <-- <-- CLOSE-WAIT 5. TIME-WAIT --> ... (Close) 6. ... <-- LAST-ACK 7. ---> CLOSED 8. TIME-WAIT <-- 9. TIME-WAIT --> ---> CLOSED (2 MSL) 10. CLOSED Figure 8. Data Received after Half Duplex Close In Figure 8, TCP B issues a CLOSE call and transmits a FIN before the arrival of the RST transmitted by TCP A on line 5, so that the RST arrives in LAST-ACK state. 3.3 Crossing RSTs The solution must be stable in the situation where both sockets transmit a RST. In this case, both ends enter TIME-WAIT state. Both RSTs are received in TIME-WAIT and ignored [RFC-1337, fix F1]. Note that conformance to [RFC-1337] is a requirement; otherwise both TIME-WAIT states will be assassinated and data corruption may occur. Heavens [Page 18] Internet Draft Problems with RSTs and Timers July 1995 4. RST Loss Hazards The solution outlined above offers partial protection against data corruption hazards arising from RST-terminated connections. However, delay or loss of a RST gives rise to a potential hazard. For TIME-WAIT state to provide full protection, it must commence after both ends of a connection have stopped transmitting data. This is guaranteed for the peer that enters TIME-WAIT, since it has transmitted a RST and no data can follow this. In addition, the transition to TIME-WAIT must take place after the other peer has ceased data transmission. The 3-way closing handshake enforces this for conventionally closed connections; TIME-WAIT state is always entered after the CLOSE-WAIT to LAST-ACK transition at the connection peer. The lack of an equivalent mechanism for RST-terminated connections leads to situations where the effective TIME-WAIT state is truncated or vanishes completely. 4.1 RST Loss and Data Retransmission Figure 9 shows a scenario where TCP A is retransmitting data seg- ments, lost because of network congestion. Because of exponential backoff, as described in [RFC-1122], the interval between successive retransmissions is now the 60 second limit common to many TCP imple- mentations. TCP B gives up and aborts the connection, entering TIME-WAIT state as mandated by the partial solution in chapter 3. The ensuing RST is lost, as the network is still congested. TCP A continues to retransmit. At some point network congestion eases, and a retransmitted data segment reaches TCP B. A new incarnation of the connection may be in existence, and the data segment may be errone- ously accepted. Heavens [Page 19] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. --> ESTABL. (lost) (User Abort) 2. ... <-- TIME-WAIT (lost) (RTX after 60 seconds) 3. ESTABL. --> TIME-WAIT (lost) (RTX after 60 seconds) 4. ESTABL. --> TIME-WAIT (lost) (RTX after 60 seconds) 5. ESTABL. --> TIME-WAIT (lost) (RTX after 60 seconds) 6. ESTABL. --> TIME-WAIT (lost) (2 MSL) 7. CLOSED (RTX after 60 seconds) 8. ESTABL. --> ... Figure 9. RST Loss and Data Retransmission 4.2 RST Loss and Idle Connections It is not necessary for retransmission to be in progress for this hazard to occur. Consider the case where the user aborts an idle connection, as shown in Figure 10. TCB B issues the abort, and enters TIME-WAIT. The RST is lost, so that TCP A remains in ESTA- BLISHED state. No activity occurs until TCP A tries to transmit data, an interval that is unbounded, and so may exceed twice the MSL. The data segment may be erroneously accepted at TCP B by a subsequent incarnation of the connection. Heavens [Page 20] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. ESTABL. (User Abort) 2. ... <-- TIME-WAIT (lost) (2 MSL) 3. CLOSED (Interval > 2MSL) 4. ESTABL. --> ... Figure 10. RST Loss and Idle Connections Heavens [Page 21] Internet Draft Problems with RSTs and Timers July 1995 5. A Complete Solution: the 2-Way Closing Handshake 5.1 Discussion The hazards caused by RST loss can only be avoided by a 2-way closing handshake for RST-terminated connections. This is required to ensure reliable delivery of the RST. A 3-way handshake is not required, since data received at the peer generating the RST need not be reli- ably acknowledged. TIME-WAIT state must be entered after both sides have stopped transmitting data, i.e. after the RST has been reliably delivered. In addition, the RST must be reliably acknowledged. There is the potential for the acknowledgement to be lost; TIME-WAIT must also fulfill the function of ensuring that retransmitted RSTs are ack- nowledged. This is analogous to the requirement that closing FINs are reliably acknowledged in FIN-terminated connections. This means that RST reception must cause a transition to TIME-WAIT, in contrast to the partial solution in the previous chapter. Cross- ing RSTs can be handled as follows: both ends of the connection change state to TIME-WAIT on RST reception. Reliable delivery of a RST requires o an interim state, between RST transmission and receipt of the ack- nowledgement. o alteration of the semantics of RST reception so that RSTs from current connections are acknowledged, while RSTs sent from CLOSED state continue to be ignored. o that a RST consume sequence number space, in a similar fashion to SYN and FIN segments, so that it may be acknowledged. o a timeout, since in the case where the peer has crashed, the interim state between RST transmission and acknowledgement may endure forever. In addition, non-conformant TCP implementations will not acknowledge a RST, and so there must always be a timeout to terminate the interim state. 5.2 Interim State The identity of the interim state between transmission and ack- nowledgement of a RST deserves consideration. If a new protocol were being designed, a separate state would be chosen. A new state for the TCP protocol has serious implications for backwards Heavens [Page 22] Internet Draft Problems with RSTs and Timers July 1995 compatibility; unless all statistics gathering applications are modi- fied, it is not possible to report it. Since a new state will prob- ably always be invisible outside TCP, a current state is preferable, if an appropriate state exists. The LAST-ACK state has similarities to the interim state; though used once a FIN has been received and acknowledged, it corresponds to the last state between transmission of a FIN and its acknowledgement. Extension of LAST-ACK to RSTs as well as FINs fits neatly with the current specification of LAST-ACK; see Appendix D. All further reference to LAST-ACK in this chapter is qualified by the fact that LAST-ACK is entered by RST transmission. There is no change to the behaviour of LAST-ACK if entered by FIN transmission, except that it follows the behaviour of other synchronised states on RST reception, i.e. transition to TIME-WAIT state. 5.3 Changes to RFC-793 State Machine The solution involves additional state transitions on RST transmis- sion, to LAST-ACK, and on RST reception, to TIME-WAIT. LAST-ACK changes state to CLOSED when the RST is acknowledged, in a similar fashion to [RFC-793]. See Appendix B for the TCP State Connection Diagrams for RST transmission and reception. Appendices D and E describe modifica- tions to [RFC-793] and [RFC-1122] respectively. LAST-ACK state after RST transmission On RST transmission by synchronised TCP connections, there is a state transition from the current state to LAST-ACK. The RST is retransmitted from LAST-ACK state, like FIN segments in LAST-ACK state, until the ACK of the RST is received, when CLOSED state is entered. RSTs with non-zero sequence numbers are acknowledged in all states except LISTEN and CLOSED. FIN segments elicit a RST in LAST-ACK; segments other than FIN, RST and ACKs of previously transmitted RSTs are ignored in LAST-ACK. If no acknowledgement is received after the retransmission timeout, LAST-ACK enters CLOSED state. TIME-WAIT state after RST reception Reception of a valid RST by a synchronised TCP connection, that causes a state transmission, results in ACK transmission and transi- tion to TIME-WAIT state. Heavens [Page 23] Internet Draft Problems with RSTs and Timers July 1995 If the peer generating the RST is in CLOSED state, the RST is not acknowledged, since the acknowledgement would elicit a further RST. This can be detected by only acknowledging RSTs containing a non-zero sequence number [RFC-793, p.36]. RSTs received in SYN-SENT state as a result of a SYN sent to a non-existent port are thus not ack- nowledged. RSTs received in TIME-WAIT are acknowledged; other seg- ments are ignored. The 2-way closing handshake is shown in Figure 11. The connections need not be in ESTABLISHED state; consult Appendix B for other states from which the closing handshake is initiated. ESTABL. ESTABL. ----------> snd RST --------------> LAST-ACK <----------- snd ACK -------------- CLOSED TIME-WAIT (2MSL) CLOSED Figure 11. 2-way Closing Handshake Figure 12 shows the modification to Figure 6 (User Abort) as a result of the addition of the 2-way closing handshake. Note that the ACK of the RST and the last data segment arrive out of order, to show the effects of data segments arriving in CLOSED state; they elicit a RST which is received in TIME-WAIT by TCP A, and ignored, as it has a zero sequence number. Heavens [Page 24] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- LAST-ACK 4. ESTABL. --> ... 5. ESTABL. <-- ... 6. ESTABL. --> ... 7. ESTABL. --> ... 8. ESTABL. <-- ... 9. TIME-WAIT --> ... 10. ... --> LAST-ACK 11. ... --> LAST-ACK 12. ... --> CLOSED 13. ... --> CLOSED 14. TIME-WAIT <-- CLOSED (2 MSL) 15. CLOSED Figure 12. Connection Closed by User Abort Figure 13 shows the modifications to Figure 9 (RST Loss and Data Retransmission). TIME-WAIT state is entered after TCP B has sent the RST and entered LAST-ACK state. Thus the duration of TIME-WAIT exceeds the lifetime of any segments from TCP B and TCP A still in the network. Heavens [Page 25] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. --> ESTABL. (User Abort) 2. ... <-- LAST-ACK (lost) (RTX Interval) 3. ESTABL. <-- <-- LAST-ACK 4. TIME-WAIT --> --> CLOSED (2 MSL) 5. CLOSED Figure 13. RST Loss and Data Retransmission FINs and RSTs The RST is considered to consume the same sequence number space as a FIN. A segment containing both a FIN and a RST is treated as a RST [RFC-793, p70]. Crossing RSTs The case where both sockets transmit a RST is shown in Figure 14. The RSTs both arrive in LAST-ACK state and cause transitions to TIME-WAIT state. Both ends spend 2MSL in TIME-WAIT state. Note that to cope with the case where one of the RSTs is lost, RSTs must be acknowledged in TIME-WAIT, otherwise one end continues to transmit RSTs in LAST-ACK state, until it times out. Heavens [Page 26] Internet Draft Problems with RSTs and Timers July 1995 TCP A TCP B 1. ESTABL. ESTABL. (User Abort) 2. ... <-- LAST-ACK (User Abort) 3. LAST-ACK --> ... 4. TIME-WAIT <-- 5. TIME-WAIT --> ... 6. ... --> TIME-WAIT 7. ... <-- TIME-WAIT 8. ... --> TIME-WAIT 9. TIME-WAIT <-- ... (2 MSL) (2MSL) 10. CLOSED CLOSED Figure 14. Crossing RSTs Crossing RSTs and FINs If LAST-ACK is entered by RST transmission, it is advisable to send a further RST if a FIN is received, as the first RST may have been lost; this ensures that the peer receives the RST and changes state to TIME-WAIT (from FIN-WAIT-1 or CLOSE-WAIT) as quickly as possible. 5.4 Interoperability with RFC-793 Implementations Implementations of [RFC-793] do not wait for acknowledgement of a RST, nor do they acknowledge a RST. An implementation of this memo enters LAST-ACK on transmission of a RST. This is not acknowledged if the peer conforms to [RFC-793]. The LAST-ACK state times out. A RST from a [RFC-793] implementation triggers a state change to TIME- WAIT, and an acknowledgement. Thus, the solution presented in this memo offers protection even when interoperating with non-conformant implementations, by a LAST-ACK or TIME-WAIT state. There is still the smaller risk of loss of RSTs, but the major risk, of data corruption because of immediate Heavens [Page 27] Internet Draft Problems with RSTs and Timers July 1995 transition to CLOSED, is avoided. Chapter 9 considers interoperability issues in greater detail. 5.5 Timer-Terminated Connections There are similar hazards to those outlined above for timer ter- minated connections. These are User Timeouts, Retransmission Timeouts [RFC-793], and the commonly implemented Keepalive Timeouts. Some TCP implementations also time out the FIN-WAIT-2 state. They all terminate a connection by entering CLOSED state. Some implementa- tions transmit a RST, but there is no guarantee of its arrival. Net- work congestion causes the timeouts by the loss of retransmitted or keepalive segments. Congestion may ease within the Maximum Segment Lifetime and the peer may transmit segments, causing potential data corruption. Some protection is gained if connections wishing to timeout first transmit a RST, enter LAST-ACK and await an ACK. Should congestion ease, the RST will be received and acknowledged, and the peer enters TIME-WAIT and times out after twice the MSL. In the majority of cases congestion will not ease, the RST will not be acknowledged, and the LAST-ACK state will time out. The effect is to increase the lifetime of the connection after the decision to timeout by the duration of the LAST-ACK state. Extension of the solution to timer-terminated connections implements the principle that all connections should include a TIME-WAIT state. However, data integrity also relies on a closing handshake, and can- not be ensured in a bounded time if the connection peer disappears. At some point, it has to be assumed that the peer is dead. 5.6 Connections Terminated by ICMP Messages Certain ICMP messages terminate a TCP connection. These include Des- tination Unreachable, codes, Port Unreachable, Protocol Unreachable or Fragmentation Required but DF set, and ICMP messages prohibiting access for administrative reasons. Usually, these are sent as a result of reception of a SYN segment, and are received in SYN-SENT state. Since the connection is unsyn- chronised, there is no danger of segments remaining in the network to corrupt a future connection. However, this is not true for Fragmen- tation Required and DF bit set; this is only sent as a result of an intermediate router receiving a data segment, so that the connection must already be synchronised. Messages indicating unreachability cannot occur for a synchronised connection, but it is possible to Heavens [Page 28] Internet Draft Problems with RSTs and Timers July 1995 construct scenarios where the path used by a synchronised connection changes to include a router which prohibits access for administrative reasons, terminating the connection. These scenarios carry very low probabilities, but for completeness and safety, synchronised TCP connections should enter TIME-WAIT state on reception of ICMP messages. Appendix E describes alterations to [RFC-1122] to implement this. 5.7 TP4 The OSI equivalent to TCP, TP4 [TP-Spec], has no mechanism for ord- erly release of a connection. Connections are closed by sending a Disconnect Request TPDU, causing a state transition to CLOSING state [TP-Spec, section 6.7]. Subsequent reception of a Disconnect Confirm TPDU, or another Disconnect Request TPDU, causes a state transition to REFWAIT state [TP-Spec, Annex A, Table 11], during which the con- nection is 'frozen' [TP-Spec, section 6.18] until this times out. The TP4 CLOSING state is analogous to the use of LAST-ACK state after RST transmission, and REFWAIT is analogous to TIME-WAIT state. The RST is analogous to a Disconnect Request PDU, and the ACK to a Disconnect Confirm PDU. The other TCP states during closedown (FIN- WAIT-1, FIN-WAIT-2, CLOSING, CLOSE-WAIT, LAST-ACK after FIN transmis- sion) have no analogies in TP4. TP4 connections that timeout by retransmission close by sending a Disconnect Request TPDU and enter CLOSING. This is analogous to the suggested behaviour for timeouts in this memo. The TP4 CLOSING state times out and enters REFWAIT if the Disconnect Request TPDU is not acknowledged. In a similar fashion, this memo times out LAST-ACK if the RST is unacknowledged. Heavens [Page 29] Internet Draft Problems with RSTs and Timers July 1995 6. Relative Probabilities of Hazards 6.1 FIN, RST, Timer and ICMP Related Hazards It is useful to compare the relative probabilities of hazards arising from FIN-, RST-, Timer- and ICMP-terminated TCP connections. The probability of each hazard is proportional to the amount of data received after transition to CLOSED. Complete protection requires that this be guaranteed to be zero. Data received after connection closure does not cause data corruption, unless it falls within the current window of a new incarnation of the connection. It is assumed that the connection peer displaying the hazard is act- ing as a data sink, maximising the data received and the probability of failure. To simplify the arithmetic, higher order effects are ignored; for instance, those arising from the loss of more than one TCP segment in the period considered. The three hazards considered are data corruption arising from the following: o Hazard 1: A FIN-terminated TCP connection with TIME-WAIT state omitted. o Hazard 2: A TCP connection aborted from Established state, with neither TIME-WAIT nor LAST-ACK states. o Hazard 3: A TCP connection aborted from Established state, with TIME-WAIT but without LAST-ACK state. Other hazards, such as connections aborted during closedown, by timeout or ICMP messages, are ignored. These are much less likely than Hazard 2. The duration of closedown is typically much shorter than that of Established state. Timeouts require multiple loss of segments in the network and represent higher order effects, with correspondingly lower probabilities. ICMP termination of synchron- ised connections requires use of the DF bit combined with fragmenta- tion, which is uncommon. Heavens [Page 30] Internet Draft Problems with RSTs and Timers July 1995 Nomenclature P1 - Relative probability of Hazard 1 P2 - Relative probability of Hazard 2 PL - Probability of loss of a TCP segment in the network PR - Probability that a TCP connection terminates by RST PT - Probability that a TCP connection terminates by timeout PI - Probability that a TCP connection terminates by ICMP message MSS - Maximum Segment Size W - Maximum offered TCP window o Hazard 1 Duplicate segments received after FIN-terminated connections usu- ally arise because of the loss of an ACK, triggering an unneces- sary retransmission. Slow start [Congestion] implies that only one segment will be retransmitted without acknowledgement. The relative probability of H1 is the segment size multiplied by the probability of segment loss and the probability of termination by FIN handshake: P1 = MSS * PL * (1 - PR - PT - PI) = MSS * PL ignoring higher order effects. o Hazard 2 For a data sink, transmission of a RST in Established state and transition to CLOSED state is followed by reception of a window of data, all of which may be received during a subsequent incarnation of the connection. The relative probability of H2 is the window size multiplied by the probability of termination by RST: P2 = W * PR o Hazard 3 In this case, a RST is lost. Any data received in TIME-WAIT causes the TIME-WAIT timer to restart, so the hazard only occurs if the gap between reception of segments exceeds the duration of TIME-WAIT state. This occurs if several retransmitted segments are lost, which is a higher order effect with low probability, or if an application spontaneously transmits data after this time, which is also unlikely. This hazard can be ignored. Heavens [Page 31] Internet Draft Problems with RSTs and Timers July 1995 6.2 Relative Probabilities of FIN- and RST-related Hazards The ratio of probabilities of hazard H2 and H1 is P2/P1 = W/MSS * PR/PL Example Calculation If Path MTU Discovery [RFC-1191] is supported, the segment size is the Maximum Segment Size indicated by the lowest physical packet size on the connection path, unless negotiated to be lower during connec- tion establishment. Implementation of [RFC-1191] is not yet widespread, so the default figure is assumed [RFC-1122, 3.3]. TCP segment size = 576 - size of TCP and IP headers = 536 Assume a window size of 32K. Appendix G summarises statistics about TCP connections, derived from a variety of connections. Taking the average percentage values of PR=1.1 and PL=1.2 derived from Appendix G: P2/P1 = W/MSS * PR/PL = 32768/536 * 1.1/1.2 = 56. For TCP connections on the same physical network, or where Path MTU Discovery is supported, the default segment size and relative proba- bility are larger. The lowest ratio consistent with the data in Appendix G can be calcu- lated from the highest value of PL (2.9) and the lowest value of PR (0.8): P2/P1 = 17. It can be concluded that erroneous acceptance of data from expired connections is significantly more likely to occur as a result of RST-terminated connections than the equivalent hazard after FIN- terminated connections. Heavens [Page 32] Internet Draft Problems with RSTs and Timers July 1995 7. Implications for Related TCP standards Extensions to the TCP standard [RFC-793] have been made in three areas which overlap the issues raised in this memo; to cope with TIME-WAIT Assassination [RFC-1337], to better handle high performance networks [RFC-1323], and to permit more efficient use of TCP for transactions [RFC-1379 and RFC-1644]. Alternatives to TIME-WAIT have been explored, but none has been adopted. If TIME-WAIT continues to be required for FIN-terminated connections to avoid acceptance of segments from expired connections, it will also be required for RST-terminated connections. 7.1 TIME-WAIT Assassination [RFC-1337] describes several hazards caused by premature termination (or 'assassination') of TIME-WAIT state in FIN-terminated connec- tions, caused by RST reception. This memo discusses these hazards in relation to RST terminated connections, and explores issues adum- brated in [RFC-1337]. 7.2 High Performance Extensions [RFC-1072], [RFC-1106], [RFC-1185], and [RFC-1323] present TCP exten- sions to improve performance over large bandwidth*delay product paths and to provide reliable operation over very high speed paths. Although truncation or replacement of TIME-WAIT is discussed, it is not specified. Section 4.3 and Appendix B of [RFC-1323] show that PAWS does not allow relaxation of MSL requirement, but this is possible if times- tamps exist across connections, as a per-host timestamp cache, and tick at least once in a period equal to the combined duration of TIME-WAIT and the round trip time. A different timeout for reliable acknowledgement of closing FINs in FIN-terminated connections is dis- cussed in Appendix B. 7.3 Extensions for Transactions Extensions to TCP to provide efficient support for Transaction Pro- cessing [RFC-1379 and RFC-1644] shorten but do not eliminate TIME- WAIT [RFC-1379, p. 34 and RFC-1644, p. 12]. The effects of RST transmission during transaction processing and any extensions to the T/TCP state machine are left for further study. Heavens [Page 33] Internet Draft Problems with RSTs and Timers July 1995 8. Issues with Backward Compatibility 8.1 Introduction There is a very large global installed base of TCP implementations and applications utilising the protocol. Therefore, the extensions described in this memo cannot be lightly undertaken. General issues of backwards compatibility are summarised in [RFC-1263], section 2.2. It may be necessary to provide options to permit maintenance of current services and semantics. Incompatibility between conformant and non-conformant implementations must be avoided. There is the risk of overcomplicating the solution to ensure backwards compatibility. The following issues affect backwards compatibility:resources and resource starvation, API semantics, interoperability, and simplicity. 8.2 Nomenclature Hard and Soft Aborts To distinguish [RFC-793] behaviour on RST transmission from that referred to in this memo, the former is characterised as a Hard Abort. The connection state is removed and the connection enters CLOSED state. A Soft Abort implements the behaviour described in this memo; a RST is transmitted and the connection enters LAST-ACK state. Appendix D includes modifications to the ABORT call in [RFC- 793] to allow both Hard and Soft Aborts. Hard and Soft Timeouts In a similar fashion, a Hard Timeout occurs when a connection enters CLOSED state on timeout, with or without transmitting a RST. A Soft Timeout occurs when a connection transmits a RST and enters LAST-ACK state on timeout. 8.3 Resources TCP connections consume a variety of resources, such as memory in the form of per-connection state and address space defined by IP addresses and TCP port numbers. The number of TCP connections may be limited by the size of static tables. Addresses are consumed at ini- tiation of a TCP connection and released at connection termination. Memory usage varies over the lifetime of a TCP connection and depends on the state of the connection and the amount of buffered user data. Addresses Heavens [Page 34] Internet Draft Problems with RSTs and Timers July 1995 There is competition for address resources because of the requirement to run simultaneous instantiations of a particular application between two communicating hosts. The quadruples defined by (local IP address, local TCP port, foreign IP address, foreign TCP port) must be unique at any point in time. These comprise a 96 bit address space that identifies all TCP con- nections currently in existence. For two communicating hosts and a particular client-server application, the two 32 bit IP addresses are fixed, as is the 16 bit port defining the service. The client port number contains 65535 non-zero port numbers, but 1023 of these are reserved for service ports, leaving 64512. New client processes requiring a service from an identical host must be allocated an unused port. Thus, there can be no more than 64512 simultaneous instantiations of an application between two hosts. Memory Unlike address space, which is a hard limit, memory usage and maximum number of TCP connections depend on the TCP implementation and avail- able resource on the host. The protocol requires significant state to be maintained for most of the lifetime of a TCP connection, although memory usage in TIME-WAIT and LAST-ACK states may be minim- ised by keeping only necessary state; little more than addresses and ports for demultiplexing, and sequence numbers for acknowledgement. However, memory requirements or other limits on the numbers of TCP connections may reduce the maximum number of simultaneous connections below the limit imposed by 64512 unique client port numbers. Connection Termination Resources are released at connection termination, which occurs as a result of the following events. Note that there is a termination event at each TCP peer; the later event defines the point at which global resource, i.e. port numbers, can be reused. o TIME-WAIT timeout for FIN-terminated connections (after ACK of FIN at peer) o SYN reception in TIME-WAIT satisfying the conditions of [RFC-1122] o RST reception for Hard Aborts (after RST transmission by peer) o Hard Timeout o TIME-WAIT timeout for Soft Aborts (after ACK reception by peer) o LAST-ACK timeout for Soft Aborts, against [RFC-793] hosts Heavens [Page 35] Internet Draft Problems with RSTs and Timers July 1995 o TIME-WAIT timeout for Soft Timeouts (after ACK reception by peer), o LAST-ACK timeout for Soft Timeouts, if peer has disappeared o LAST-ACK timeout for Soft Timeouts, against [RFC-793] hosts Behaviour for the extensions in this memo is identical to [RFC-793] except for the last five cases: soft aborts and soft timeouts prolong connection termination and thus connection duration. However, timer-terminated connections are never short lived, so increase in connection duration is unlikely to result in problems. Connections terminated by Soft Aborts are thus the only ones vulnerable to resource starvation. 8.4 Resource Starvation TIME-WAIT has a marked effect on resource usage, as it significantly prolongs connection duration. This is also true for LAST-ACK if the RST is not acknowledged. In this case, any conclusions derived in this section for TIME-WAIT state apply equally to LAST-ACK. Note that since TIME-WAIT ties up the (address, port) quadruple, the foreign peer cannot re-establish the connection even though it ter- minates before TIME-WAIT. Normal operation for client-server appli- cations involves the client application being allocated the next unused TCP port every time it establishes a new connection. As long as it does not cycle through all 64512 available TCP client ports before TIME-WAIT expires, the (address, port) quadruple will be available for reuse when needed. If the lifetime of a TCP connection executing a specific application is L seconds, TCP ports will be exhausted if new connections are opened at a rate exceeding 64512/L. Fin-terminated connections last for at least 2MSL or 240 seconds, the duration of TIME-WAIT, so that the rate at which new connections are opened must be less than 64512/240 or 268 per second (see [RFC-1379], page 5). RST-terminated connections following [RFC-793] have no TIME-WAIT state, but those following this memo do, with a possibly unacceptable increase in resource usage. The types of TCP connection which lead to resource starvation are short lived and generated at a high rate, such as remote procedure calls and other transaction processing applications. [RFC-1263], section 3.2 summarises the problem, which is "caused by short port numbers, long MSLs, and the misuse of TCP as a request-reply proto- col". Processing power, I/O bandwidth and network bandwidth have increased to the point that a TCP connection comprising a simple request-reply may take several orders of magnitude less than a Heavens [Page 36] Internet Draft Problems with RSTs and Timers July 1995 second, excluding the TIME-WAIT state. The issue of backwards compatibility arises with short-lived applica- tions which always transmit a RST to close a connection, rather than as a result of an unusual condition, such as a full disk (see Figure 5). They exploit the loophole identified by this memo to avoid TIME-WAIT and so truncate the lifetime of a TCP connection, thereby reducing resource consumption and permitting much higher rates of TCP connection establishment than 268 per second. TCP port resource usage cannot be decreased, so that the port address space must be increased, if short lived RST-terminated applications are to continue working and include the TIME-WAIT state. This either requires modification to the TCP header or its equivalent, the use of TCP options to extend the port address space. This problem is the same for FIN-terminated connections and is outside the scope of this memo; port exhaustion will only be handled here by reverting to [RFC-793] behaviour. 8.5 Approaches to Configuration Configuration is only necessary if it is impossible to find an elegant solution that maintains backwards compatibility. Configura- tion of behaviour between conformance and nonconformance is a simple mechanism to maintain backwards compatibility. Global Configuration versus Per-Connection Configuration Configuration may be global, affecting all TCP connections from that host, or per-connection. Per-connection behaviour may be configured statically through the API, or negotiated with the connection peer. If both global and per-connection configuration exist, the global option dictates behaviour for all connections that do not utilise per-connection configuration; where there is per-connection confi- guration, it overrides the value of the global configuration. Per-connection configuration allows more flexibility in that indivi- dual applications that cause problems may be configured to maintain backwards compatibility, others conforming to this memo. A mechanism must be added to the API to allow per-connection configuration, nor- mally in the form of an option; it also relies on access to the source code of the application at both client and server peers. Static Configuration versus Negotiated Configuration Behaviour can be autoconfigured via negotiation with the peer TCP connection. Typically this uses TCP options, although there is the Heavens [Page 37] Internet Draft Problems with RSTs and Timers July 1995 potential for carrying data with RST segments. Use of TCP options complicates the implementation and may cause interoperability prob- lems with current implementations, unless negotiation occurs at con- nection initiation; this approach does not permit different behaviour according to the type of abort, since this is unknown at connection initiation. In addition, this is an implicitly pessimis- tic approach in that conformance to this memo only occurs between two hosts that support the extensions. Its advantage is that no manual conformance is necessary. Static configuration is simpler, but requires that both peers are independently configured to ensure backwards compatibility, since resource starvation occurs if either the initiator or receive of the RST conforms to this memo. Optimistic versus Pessimistic Configuration Defaults Static configuration defaults may be optimistic, where they must be changed to maintain backwards compatibility, or pessimistic, main- taining backwards compatibility unless configured otherwise. The choice depends on the relative frequency of problems caused by the deficiency which this memo fixes, and those caused by lack of back- wards compatibility. Granularity of Configuration Not all RST-terminated connections need be implemented as Hard Aborts to preserve backwards compatibility. There are several types of RST- and Timer-terminated connections: User Abort, MIB II Abort, Half Duplex Close, Precedence Mismatch, Retransmission, User and Keepalive Timeout. It is possible to configure conformance for each of these, but at the expense of simplicity. A single configuration variable simplifies implementation and administration, at the expense of the flexibility provided by a finer granularity of configuration. 8.6 API Semantics It is important to maintain the semantics of the API used to open, use and close TCP connections. If the API supports closure through RST transmission, Soft Aborts should appear to be identical to Hard Aborts from the point of view of the application. The same issues apply for handling new SYNs in TIME-WAIT as for FIN- terminated connections [RFC-1122]. If the implementation follows Heavens [Page 38] Internet Draft Problems with RSTs and Timers July 1995 [RFC-1122] in this respect for FIN-terminated connections, it should do the same for RST-terminated and Timer-terminated connections. This corresponds to the requirement that the result of an open call be successful for remote peers in TIME-WAIT states that conform to [RFC-1122], however TIME-WAIT state is entered. 8.7 Interoperability Interoperability between conformant and non-conformant implementa- tions must also be shown, since conformant implementations will be in the minority, until they diffuse throughout the Internet, and there will always be non-conformant implementations in existence. 8.8 Simplicity Requirements of backwards compatibility are notorious for overcompli- cating an otherwise elegant solution [RFC-1263]. Excessive confi- gurability also makes implementation and interoperability testing more difficult; the combinations that require interoperability test- ing scale as the square of the number of configurable options. Heavens [Page 39] Internet Draft Problems with RSTs and Timers July 1995 9. Solution with Backwards Compatibility 9.1 Introduction In this section a solution is sought for which satisfies the require- ments of the previous chapter. The solution should permit mainte- nance of current resource usage and API semantics, interoperate with current TCP implementations, and be as simple as possible. Confi- guration mechanisms and defaults must also be considered. 9.2 Solution Applications that generate large numbers of short lived connections which terminate by RST are rare; there are very few that generate large numbers of short lived connections and terminate by FIN. An optimistic approach is adopted whereby the default is to conform to this memo. Negotiation with the TCP peer is thus rejected, as well as on the grounds of complexity. In addition, a coarse grained approach is taken to configuration, since rarely will the default behaviour need to be changed; all non-FIN terminated connections fol- low the same behaviour, dictated by a single configuration flag. An exception may be made in that connections aborted by administrative action via SNMP may always be a Hard Abort. Ideally, the option of both Hard and Soft Aborts would be provided; this is outside the scope of this memo, but plausible extensions to [RFC-1213] are described in Appendix F. To allow flexibility, both a global flag and a per-connection flag are provided; most APIs have a mechanism to configure optional behaviour for each connection. 9.3 Configuration A Soft_Abort flag controls behaviour. There is a global flag per host, and a per-connection flag which can be used to override the value of the global flag. Soft_Abort If true all RST- and Timer- terminated connections follow the behaviour of this memo, in transmitting RSTs, receiving RSTs, and acknowledging RSTs. If false then all RST- and Timer- terminated connections follow [RFC-793]. Default is Soft_Abort=True. 9.4 API Semantics API semantics are implementation dependent, but the socket API is very popular. This section ensures backwards compatibility for Heavens [Page 40] Internet Draft Problems with RSTs and Timers July 1995 applications and implementations utilising the socket API. This is done by maintaining current semantics and implementing a new socket option to change behaviour. Close Socket close is synchronous for all FIN-terminated connections. The close call does not return until the 3-way handshake is complete, unless the SO_LINGER option is set with a nonzero linger field. In the latter case the socket attempts to deliver data until a timeout equal to the value of the linger field (in clock ticks or seconds depending on BSD implementations). If the SO_LINGER option is set with a zero linger field, a RST is sent and the close call returns immediately. A close call executing a Soft Abort must return immediately to maintain backwards compatibil- ity; RST-terminated connections always terminate asynchronously. The connection state does not disappear, until the 2-way handshake com- pletes or there is a timeout. New SYNs in TIME-WAIT New SYNs may be accepted in TIME-WAIT resulting from a RST-terminated connection if the conditions of [RFC-1122] apply, as described in Appendix B. New SYNs may not be accepted in LAST-ACK, since this implies that it is possible to transmit a segment successfully, so the reliable acknowledgement of the closing RST can succeed. Socket Option A new socket option that can be set by the setsockopt call would allow behaviour to be altered from the global configuration value. The socket API could be extended to do this by defining a new option, SO_ABORT, with the following fields. #define SOFT_ABORT 1 typedef so_abort { unsigned long Abort_Flags; } If the lowest bit in Abort_flags is set, the TCP connection conforms to this memo; if it is clear, it conforms to [RFC-793]. The other 31 bits are reserved and must be zero. If the option is not used, the global Soft_Abort flag dictates behaviour. Heavens [Page 41] Internet Draft Problems with RSTs and Timers July 1995 9.5 Interoperability with RFC-793 There are two cases where interoperability needs to be shown. The first is shown in Figure 15; here an implementation of this memo transmits a RST to an implementation of [RFC-793]. TCP A is the implementation of [RFC-793] and TCP B is an implementation of this memo. TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- LAST-ACK 4. ESTABL. --> --> LAST-ACK 5. ESTABL. <-- ... 6. ESTABL. --> --> LAST-ACK 7. CLOSED <-- ... (RTX Timeout) 8. CLOSED Figure 15. Interoperability between this memo and RFC-793 Heavens [Page 42] Internet Draft Problems with RSTs and Timers July 1995 Figure 16 shows an implementation of [RFC-793] (TCP B) which transmits a RST to an implementation of this memo (TCP A). TCP A TCP B 1. ESTABL. --> --> ESTABL. 2. ... <-- ESTABL. (User Abort) 3. ... <-- CLOSED 4. ESTABL. --> ... 5. ESTABL. <-- ... 6. ESTABL. --> ... 7. ESTABL. <-- ... 8. TIME-WAIT --> ... 9. ... --> CLOSED 10. TIME-WAIT <-- <-- CLOSED 11. ... --> CLOSED 12. TIME-WAIT <-- <-- CLOSED 13. ... --> CLOSED 14. TIME-WAIT <-- <-- CLOSED (2MSL) 15. CLOSED Figure 16. Interoperability between RFC-793 and this memo In both cases, assuming RSTs are not lost, one end of the connection stays in TIME-WAIT or LAST-ACK for 2MSL or the retransmission timeout respectively. Heavens [Page 43] Internet Draft Problems with RSTs and Timers July 1995 Security Considerations Security issues are not discussed in this memo. References [Congestion] V. Jacobson, "Congestion Avoidance and Control," ACM SIGCOMM-88, August 1988. [RFC-792] J. Postel, "Internet Control Message Protocol", RFC-792, USC/Information Sciences Institute, September 1981. [RFC-793] Postel, J., "Transmission Control Protocol", RFC-793, USC/Information Sciences Institute, September 1981. [RFC-959] J. Postel, J. Reynolds, "File Transfer Protocol", RFC-959, ISI, October 1985. [RFC-1157] M. Schoffstall, M. Fedor, J. Davin, J. Case, "A Simple Network Management Protocol (SNMP)", RFC-1157, October 1990. [RFC-1185] Jacobson, V., Braden, R., and Zhang, L., "TCP Extension for High- Speed Paths", RFC-1185, Lawrence Berkeley Labs, USC/Information Sciences Institute, and Xerox Palo Alto Research Center, October 1990. [RFC-1191] J. Mogul, S. Deering, "Path MTU Discovery", RFC-1191, November 1990. [RFC-1213] K. McCloghrie, M. Rose, "Management Information Base for Network Management of TCP/IP-based internets: MIB-II", RFC-1213, March 1991. [RFC-1263] L. Peterson, S. O'Malley, "TCP Extensions Considered Harmful", RFC-1263, October 1991. [RFC-1323] Jacobson, V., Braden, R. and D. Borman "TCP Extensions for High Performance", RFC-1323, Lawrence Berkeley Labs, USC/Information Heavens [Page 44] Internet Draft Problems with RSTs and Timers July 1995 Sciences Institute, and Cray Research, May 1992. [RFC-1337] R. Braden, "TIME-WAIT Assassination Hazards in TCP", RFC-1337, ISI, May 1992. [RFC-1379] R. Braden, "Extending TCP for Transactions -- Concepts", RFC-1379, November 1992. [RFC-1644] R. Braden, "T/TCP -- TCP Extensions for Transactions Functional Specification", RFC-1644, July 1994. [TCP/IP-Illustrated] Gary Wright & Richard Stevens, "TCP/IP Illustrated, Volume 2", Addison-Wesley 1995. [TP-Spec] Information processing systems - Open Systems Interconnection - Connection oriented transport protocol specification ISO/IEC 8073. Acknowledgements Thanks to Alan Cox and Jon Crowcroft for their comments on this draft. Thanks to George Wilkie for interpreting the TP4 specification in English, and to Nick Felisiak for first confirming my suspicions that something was wrong. Author's Address: Ian Heavens Spider Systems Limited Stanwell Street Edinburgh EH6 5NG Scotland, UK Phone: (UK) 31 555 5166 Fax: (UK) 31 555 0664 Email: ianh@spider.co.uk Heavens [Page 45] Internet Draft Problems with RSTs and Timers July 1995 10. Appendix A: TCP Connection State Diagram (Partial Solution) +-----------+ | SYN-RCVD + +-----------+ +---------+ +---------+ snd RST / |FINWAIT-1| | ESTAB | -------------------- +---------+ +---------+ | | | snd | +-----------+ | snd RST | RST | | CLOSE-WAIT| ----------------------------- | /----/ +-----------+ | | | | V V V | +----------+ snd RST +---------+ snd RST | |FIN-WAIT-2|---------------->|TIME-WAIT|<------------------------ +----------+ +---------+ ^ | +----------+ snd RST | | | CLOSING |------------------| | +----------+ | Timeout=2MSL | +---------+ | | LAST-ACK| V +---------+ +---------+ snd RST | | CLOSED |<------------------------ +---------+ ^ | snd RST | | +---------+ | SYN-SENT| +---------+ TCP Connection State Diagram: RST Transmission Heavens [Page 46] Internet Draft Problems with RSTs and Timers July 1995 11. Appendix B: TCP Connection State Diagram (Full Solution) +-----------+ | SYN-RCVD + +-----------+ +---------+ +---------+ snd RST / |FINWAIT-1| | ESTAB | -------------------- +---------+ +---------+ | | | snd | +-----------+ | snd RST | RST | | CLOSE-WAIT| ----------------------------- | /----/ +-----------+ | | | | V V V | +----------+ snd RST +---------+ snd RST | |FIN-WAIT-2|---------------->| LAST-ACK|<------------------------ +----------+ +---------+ ^ | receive +----------+ snd RST | | ACK of RST | CLOSING |------------------| | +----------+ | V +---------+ | CLOSED | +---------+ ^ | snd RST | | +---------+ | SYN-SENT| +---------+ TCP Connection State Diagram: RST Transmission Heavens [Page 47] Internet Draft Problems with RSTs and Timers July 1995 +-----------+ | SYN-RCVD + +-----------+ +---------+ +---------+ rcv RST, snd ACK / |FINWAIT-1| | ESTAB | -------------------- +---------+ +---------+ | | rcv RST | | | snd ACK | | +-----------+ | rcv RST, snd ACK | | | CLOSE-WAIT| ----------------------------- | /----/ +-----------+ | | | | V V V | +----------+ rcv RST,snd ACK +---------+ rcv RST, snd ACK | |FIN-WAIT-2|---------------->|TIME-WAIT|<------------------------ +----------+ +---------+ ^ | ^ +----------+ rcv RST, snd ACK | | | rcv RST, snd ACK +---------+ | CLOSING |------------------| | |--------------------| LAST-ACK| +----------+ | +---------+ Timeout=2MSL | | V +---------+ | CLOSED | +---------+ ^ | rcv RST | | +---------+ | SYN-SENT| +---------+ TCP Connection State Diagram: RST Reception Heavens [Page 48] Internet Draft Problems with RSTs and Timers July 1995 12. Appendix C: A Different Interpretation of RFC-1122 There are problems with interpreting [RFC-1122] to respond to the arrival of data after half duplex close with a RST and no state change. The connection hangs if data arrives at TCP A in FIN-WAIT-2, as Figure 17 shows. TCP A TCP B 1. ESTABLISHED ESTABLISHED (Close) 2. FIN-WAIT-1 --> --> CLOSE-WAIT 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT 4. FIN-WAIT-2 <-- <-- CLOSE-WAIT (user data after half duplex close) 5. FIN-WAIT-2 --> --> CLOSED Figure 17. Data Received in FIN-WAIT-2 after Half Duplex Close If the ACK of the FIN is lost or delayed, and data arrives in FIN- WAIT-1, the connection terminates without entering TIME-WAIT state. This is shown in Figure 18. TCP A TCP B 1. ESTABLISHED ESTABLISHED (Close) 2. FIN-WAIT-1 --> --> CLOSE-WAIT 3. (lost) ... <-- CLOSE-WAIT 4. FIN-WAIT-1 <-- <-- CLOSE-WAIT (user data after half duplex close) 5. FIN-WAIT-1 --> --> CLOSED 6. FIN-WAIT-1 --> --> CLOSED 7. CLOSED <-- <-- CLOSED Figure 18. Data Received in FIN-WAIT-1 after Half Duplex Close Heavens [Page 49] Internet Draft Problems with RSTs and Timers July 1995 13. Appendix D: Modifications to RFC-793 The following modifications to [RFC-793] implement the solution in this document, without regard to backwards compatibility. o TCP State Meanings: Section 3.2, page 21 LAST-ACK should be changed to: LAST-ACK - represents waiting for an acknowledgement of the connec- tion termination request (FIN or RST) previously sent to the remote TCP. o TCP Connection State Diagram: Section 3.2, page 23 Note that this needs to be supplemented by the diagram in Appendix B. State transitions on RST transmission are as follows: In all states except LISTEN, CLOSED and SYN-SENT, enter LAST-ACK. RSTs cannot be transmitted in LISTEN; if transmitted in CLOSED or SYN-SENT, remain in or go to CLOSED state. State transitions on RST reception are as follows: Send an ACK and enter TIME-WAIT in all states except LISTEN, CLOSED and SYN-SENT; in LISTEN and CLOSED ignore the RST, in SYN-SENT enter CLOSED state. In TIME-WAIT, acknowledge the RST. o Sequence Numbers: Section 3.3, page 26 The last paragraph is changed to the following: We have taken advantage of the numbering scheme to protect certain control information as well. This is achieved by implicitly includ- ing some control flags in the sequence space so they can be retransmitted and acknowledged without confusion (i.e., one and only one copy of the control will be acted upon). Control information is not physically carried in the segment data space. Consequently, we must adopt rules for implicitly assigning sequence numbers to con- trol. The SYN, FIN and RST are the only controls requiring this pro- tection, and these controls are used only at connection opening and closing. For sequence number purposes, the SYN is considered to occur before the first actual data octet of the segment in which it occurs, while the FIN or RST is considered to occur after the last actual data octet in a segment in which it occurs. The FIN and RST are considered to occupy the same sequence number space; when both are present in a segment, they both occupy the sequence number after the last data octet. The segment length (SEG.LEN) includes both data and sequence space occupying controls. When a SYN is present then SEG.SEQ is the sequence number of the SYN. Heavens [Page 50] Internet Draft Problems with RSTs and Timers July 1995 o ABORT Call: Section 3.8, page 50 This should be changed to the following: Abort Format: ABORT (local connection name, type) If the type is a Soft Abort, this command causes all pending SENDs and RECEIVES to be aborted, LAST-ACK state to be entered, and a special RESET message to be sent to the TCP on the other side of the connection. If the type is a Hard Abort, this command causes all pending SENDs and RECEIVES to be aborted, the TCB to be removed, CLOSED state to be entered, and a special RESET message to be sent to the TCP on the other side of the connection. In both cases, depending on the implementation, users may receive abort indications for each outstanding SEND or RECEIVE, or may simply receive an ABORT-acknowledgement. o ABORT Call: Section 3.9, page 62 Relevant sections should be replaced by the following: ABORT Call .... SYN-RECEIVED STATE ESTABLISHED STATE FIN-WAIT-1 STATE FIN-WAIT-2 STATE CLOSE-WAIT STATE Send a reset segment: All queued SENDs and RECEIVEs should be given "connection reset" notification; all segments queued for transmission (except for the RST formed above) or retransmission should be flushed. Heavens [Page 51] Internet Draft Problems with RSTs and Timers July 1995 If the abort is a Soft Abort, enter LAST-ACK state, and return. If the abort is a Hard Abort, delete the TCB, enter CLOSED state, and return. CLOSING STATE If the abort is a Soft Abort, send a RST, enter LAST-ACK state, and return. If the abort is a Hard Abort, delete the TCB, enter CLOSED state, and return. LAST-ACK STATE If the abort is a Soft Abort, send a RST. If the abort is a Hard Abort, delete the TCB, enter CLOSED state, and return. TIME-WAIT STATE If the abort is a Soft Abort, send a RST. If the abort is a Hard Abort, delete the TCB, enter CLOSED state, and return. Heavens [Page 52] Internet Draft Problems with RSTs and Timers July 1995 o SEGMENT ARRIVES: Section 3.9, pages 70-71: second check the RST bit, SYN-RECEIVED STATE If the RST bit is set If this connection was initiated with a passive OPEN (i.e., came from the LISTEN state), then return this connection to LISTEN state and return. The user need not be informed. If this connection was initiated with an active OPEN (i.e., came from SYN-SENT state) then the connection was refused, signal the user "connection refused". In either case, all segments on the retransmission queue should be removed. And in the active OPEN case, enter the CLOSED state and delete the TCB, and return. ESTABLISHED FIN-WAIT-1 FIN-WAIT-2 CLOSE-WAIT If the RST bit is set then, any outstanding RECEIVEs and SEND should receive "reset" responses. All segment queues should be flushed. Users should also receive an unsolicited general "connection reset" signal. If the RST bit is set and the sequence field in the header is non-zero then transmit an ACK, enter TIME-WAIT state, and return. CLOSING STATE LAST-ACK STATE If the RST bit is set and the sequence field in the header is non-zero then transmit an ACK, enter the TIME-WAIT state, and return. TIME-WAIT If the RST bit is set and the sequence field in the header is non-zero then transmit an ACK and return. Heavens [Page 53] Internet Draft Problems with RSTs and Timers July 1995 third check security and precedence SYN-RECEIVED If the security/compartment and precedence in the segment do not exactly match the security/compartment and precedence in the TCB then send a reset, and return. ESTABLISHED STATE If the security/compartment and precedence in the segment do not exactly match the security/compartment and precedence in the TCB then send a reset, any outstanding RECEIVEs and SEND should receive "reset" responses. All segment queues should be flushed. Users should also receive an unsolicited general "connection reset" signal. Enter the LAST-ACK state. Note this check is placed following the sequence check to prevent a segment from an old connection between these ports with a different security or precedence from causing an abort of the current connection. fourth, check the SYN bit, SYN-RECEIVED ESTABLISHED STATE FIN-WAIT STATE-1 FIN-WAIT STATE-2 CLOSE-WAIT STATE CLOSING STATE LAST-ACK STATE TIME-WAIT STATE If the SYN is in the window it is an error, send a reset, any outstanding RECEIVEs and SEND should receive "reset" responses, all segment queues should be flushed, the user should also receive an unsolicited general "connection reset" signal, enter the LAST-ACK state, and return. If the SYN is not in the window this step would not be reached and an ack would have been sent in the first step (sequence number check). Heavens [Page 54] Internet Draft Problems with RSTs and Timers July 1995 o SEGMENT ARRIVES: Section 3.9, page 73 LAST-ACK STATE If entered by FIN handshake, the only thing that can arrive in this state is an acknowledgement of our FIN. If our FIN is now acknowledged, delete the TCB, enter the CLOSED state, and return. TIME-WAIT STATE If entered by FIN handshake, the only thing that can arrive in this state is a retransmission of the remote FIN. Acknowledge it and restart the 2 MSL timeout. If entered by RST reception, acknowledge RSTs and restart the 2MSL timeout. Ignore other segments. o SEGMENT ARRIVES: Section 3.9, pages 75-76 LAST-ACK STATE If entered by RST transmission, send a RST, restart the retransmission timeout and return. Otherwise, do nothing. TIME-WAIT STATE Remain in the TIME-WAIT state. Restart the 2 MSL time-wait timeout. o USER TIMEOUT: Section 3.9, page 77 The behaviour of USER TIMEOUT changes to the following, and the RETRANSMISSION TIMEOUT section includes the following note: USER TIMEOUT For any state, if the user timeout expires, or other timers which cause the connection to expire such as keepalive timers, flush all queues, signal the user "error: connection aborted due to user timeout" in general and for any outstanding calls, send a RST, enter the LAST-ACK state and return. The above behaviour is also followed when the maximum number of retransmissions has been reached, except in LAST-ACK state. In this case there is a state transition to CLOSED. Heavens [Page 55] Internet Draft Problems with RSTs and Timers July 1995 RETRANSMISSION TIMEOUT Note that SYNs, FINs and RSTs are retransmitted, like data segments. Heavens [Page 56] Internet Draft Problems with RSTs and Timers July 1995 14. Appendix E: Modifications to RFC-1122 Pages 87-89 of [RFC-1122], section 4.2.2.13, are modified as follows: 4.2.2.13 Closing a Connection: RFC-793 Section 3.5 A TCP connection may terminate in three ways: (1) the normal TCP close sequence using a FIN handshake, (2) a "Soft Abort", in which one or more RST segments are sent and the connection enters LAST-ACK, and (3) a "Hard Abort" in which in which one or more RST segments are sent, the TCB removed and the connection enters CLOSED state. If a TCP connection is closed by the remote site, the local application MUST be informed whether it closed normally or was aborted. The normal TCP close sequence delivers buffered data reliably in both directions. Since the two directions of a TCP connection are closed independently, it is possible for a connection to be "half closed," i.e., closed in only one direction, and a host is permitted to continue sending data in the open direction on a half-closed connection. A host MAY implement a "half-duplex" TCP close sequence, so that an application that has called CLOSE cannot continue to read data from the connection. If such a host issues a CLOSE call while received data is still pending in TCP, or if new data is received after CLOSE is called, its TCP SHOULD send a RST to show that data was lost, and enter LAST-ACK. When a connection is closed actively, by transmitting a FIN or RST as a result of a Soft Abort, it MUST linger in TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). However, it MAY accept a new SYN from the remote TCP to reopen the connection directly from TIME-WAIT state, if it: (1) assigns its initial sequence number for the new connection to be larger than the largest sequence number it used on the previous connection incarnation, and (2) returns to TIME-WAIT state if the SYN turns out to be an old duplicate. Heavens [Page 57] Internet Draft Problems with RSTs and Timers July 1995 Pages 103-104 of [RFC-1122], section 4.2.3.9, are modified as fol- lows: 4.2.3.9 ICMP Messages TCP MUST act on an ICMP error message passed up from the IP layer, directing it to the connection that created the error. The necessary demultiplexing information can be found in the IP header contained within the ICMP message. o Source Quench TCP MUST react to a Source Quench by slowing transmission on the connection. The RECOMMENDED procedure is for a Source Quench to trigger a "slow start," as if a retransmission timeout had occurred. o Destination Unreachable -- codes 0, 1, 5 Since these Unreachable messages indicate soft error conditions, TCP MUST NOT abort the connection, and it SHOULD make the information available to the application. DISCUSSION: TCP could report the soft error condition directly to the application layer with an upcall to the ERROR_REPORT routine, or it could merely note the message and report it to the application only when and if the TCP connection times out. o Destination Unreachable -- codes 2-4 These are hard error conditions, so TCP SHOULD abort the connection. If the connection is in a synchronised state, it should enter TIME-WAIT. o Time Exceeded -- codes 0, 1 This should be handled the same way as Destination Unreachable codes 0, 1, 5 (see above). o Parameter Problem This should be handled the same way as Destination Unreachable codes 0, 1, 5 (see above). Heavens [Page 58] Internet Draft Problems with RSTs and Timers July 1995 15. Appendix F: Modifications to RFC-1213 The following modifications to [RFC-1213], MIB-II, [Page 50], would allow complete flexibility in aborting connections. tcpConnState OBJECT-TYPE SYNTAX INTEGER { closed(1), listen(2), synSent(3), synReceived(4), established(5), finWait1(6), finWait2(7), closeWait(8), lastAck(9), closing(10), timeWait(11), hardAbort(12), softAbort(13) } ACCESS read-write STATUS mandatory DESCRIPTION "The state of this TCP connection. The only values which may be set by a management station are hardAbort(12) and softAbort(13). Accordingly, it is appropriate for an agent to return a `badValue' response if a management station attempts to set this object to any other value. If a management station sets this object to the value softAbort(13), then a RST is transmitted and the connection enters LAST-ACK state. If a management station sets this object to the value hardAbort(12), then this has the effect of deleting the TCB (as defined in RFC 793) of the corresponding connection on the managed node, resulting in immediate termination of the connection. As an implementation-specific option, a RST segment may be sent from the managed node to the other TCP endpoint (note however that this RST segment is not sent reliably)." ::= { tcpConnEntry 1 } Heavens [Page 59] Internet Draft Problems with RSTs and Timers July 1995 16. Appendix G:Traffic Statistics for TCP Connections Statistics were measured using the netstat program on six machines: [1] A home workstation (VMS) used for telecommuting via a 56Kb Frame Relay link to the Internet. [2] A DNS and mail gateway (VMS) at the University of Tucson, Arizona. [3] A personal workstation (SunOS 4.1.3) on Spider Systems' corporate LAN. [4] The BSD development system (BSD4.4-Lite) at the Computer Science department, Berkeley, California (taken from [TCP/IP-Illustrated], p.799). [5] A file server (SunOS 4.1.3) on Spider Systems' corporate LAN. [6] An application gateway (SunOS 4.1.3) between Spider Systems' cor- porate LAN and the Internet. The columns show statistics collected by the BSD netstat utility or its VMS equivalent, with the exception of machine uptime. The derivation of the statistics from the BSD TCP/IP "tcpstat" structure is shown in parentheses. o machine (M) o time in days that the machine has been up (U) o number of TCP connections established (tcpstat.tcp_connects). o number of TCP connections aborted by RST transmission, expressed as a sum of the total aborted excluding those aborted by reception of data after half duplex close, and those aborted after half duplex close ((tcpstat.tcps_drops - tcpstat.tcps_rcvafterclose) + tcpstat.tcps_rcvafterclose). o number of TCP connections timed out expressed as a sum of the number timed out by retransmissions and keepalives (tcpstat.tcps_timeoutdrop + tcpstat.tcps_keeptimeo). o total number of TCP data segments transmitted, excluding retransmissions (tcpstat.tcps_sndpack - tcpstat.tcps_sndrexmitpack). o total number of TCP data segments retransmitted Heavens [Page 60] Internet Draft Problems with RSTs and Timers July 1995 (tcpstat.tcps_sndrexmitpack). M U Establ. Dropped Timed Out TXed Segs RTXed Segs. 1 2 408 4+1 263+1 135168 250 2 5 46632 456+102 7338+551 317523 4756 3 ? 138682 13349+3686 79+2345 22761633 104440 4 30 126820 44+1017 86+3219 8920528 257295 5 20 13557 198+205 43+28 1559505 1675 6 14 48226 3943+1396 11+190 11505576 67401 Percentage values for aborted and timed out connections, and for seg- ment loss, are as follows. Machine Dropped (%) Timed Out (%) Retransmissions (%) 1 1.2 64.7 0.18 2 1.2 16.9 1.50 3 12.3 1.75 0.46 4 0.8 2.60 2.88 5 3.0 0.52 0.11 6 11.1 0.42 0.59 Machine 3 and 5 are internal to a LAN and mostly handle NFS traffic, so may be expected to have different connection and segment losses. Dropped connections for machine 6 are such a high proportion that some pathological system or application problem can be suspected. These machines are excluded from calculations. Aborted connections yield more consistent percentages than timeouts and segment loss rates; this may be because the latter are more sus- ceptible to the characteristics of nearby networks, whereas aborts are a function of application or system behaviour. For instance, an excessive proportion of machine 1's TCP connections expire because of retransmission timeouts; this may be due to an unreliable link. For machines 1, 2 and 4, the average percentage drop rate is 1.1%. The average retransmission rate is 1.2%. The lowest percentage drop rate is 0.8, and the highest retransmission rate is 2.9%. Heavens [Page 61]