Network Working Group J. Freniche CASA Category: Informational July 1998 TCP Window Probe Deadlock Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Copyright Notice. Copyright (C) The Internet Society (1998). All Rights Reserved. Introduction. In the course of developing/testing a TCP/IP stack for embedded computers, a situation that can be called 'TCP window probe deadlock' and subsequent connection abort has been observed. The above has been detected when a client host sends, using TCP (Ref. 1), a huge amount of data to a server host, which in turns processes such input and returns also a huge amount of data. If the sender does not mix appropriately send and receive requests to its underlaying TCP, it is possible to enter in a situation where both applications are blocked and the respective TCP layers interchange window probes forever (unless aborted by some sort of alarm). Initially it was though that the deadlock was a fault when implementing the TCP/IP stack. But it has been reproduced immediately in several other configurations (FreeBSD <-> FreeBSD, FreeBSD <-> HP- UX, HP-UX <-> HP-UX, FreeBSD <-> Solaris and FreeBSD <-> AIX, all tested using Ethernet interfaces and also when using the local interface). Given its nature, it is believed that it occurs in many, if not all, TCP implementations. Next section gives indications on how to reproduce the deadlock, Freniche Informational [Page 1] followed by a more detailed description and analysis. Conditions and factors having influence in the deadlock are examined and a solution (at TCP and application level) is proposed, whose impact in current applications is analyzed. Appendixes with traces are also included. For those more curious, the board was a "bare machine" Motorola MC68040 single board computer with an AMD79C90 LANCE interface, programming language was Ada, operating system was the nuke Ada Run Time System. Reproducing the Deadlock. C and S are hosts communicating by TCP. C (client) runs a client program that sends lots of data to S (server) between receive requests. The server processes such data and returns to the client also lots of data. The interface between the applications and the underlaying TCPs is blocking (which is the default behavior for Sockets). Note that C and S do not need to be directly connected, i.e., routers can exist between C and S. A good example for such a server application is the echo service (Ref. 2). Specifically the deadlock was detected by enabling the echo service in the server S, and then running the "tcpecho" client program (included in Appendix 1) in the client C. In the next explanations, "echo" will be used as such server, however any client/server pair with the same characteristics must exhibit the deadlock. In the server, enable the stream echo service (uncomment the echo stream service in /etc/inetd.conf and then reboot "inetd"). In the client, compile the "tcpecho.c" program and execute it (as a normal user): client> tcpecho -n 1 -a 120 -m 60000 server A where: -n 1 send just 1 buffer -a 3600 set an alarm (for socket operations) for 3600 seconds -m 60000 buffer size is 60000 bytes server name or ip address of the server A just one character placed in all bytes of the payload Payload (-m 60000) may need to be adjusted to provoke the deadlock, as it depend upon the receiver buffer sizes and other connection parameters in both sides. Freniche Informational [Page 2] The communication between S and C is monitored using a "sniffer" (HP Advisor) and "tcpdump" (Ref. 3) in any machine attached to the same subnet as anyone of the other two hosts (assuming the media is Ethernet), that even can be the client or the server host. Really, "tcpdump" is sufficient to see the deadlock. Clearer trace is obtained by setting "tcpdump" strong filters for the socket pair (C, port number) and (S, port number). One sample of the trace with the deadlock is included in Appendix 2. Description. Once the TCP connection is established, segments are interchanged between C and S. After some amount of data is send and received, the client continue sending segments (with data) but announcing its receive window is 0. S stops then sending data to C, but continues receiving data segments (with window 0) from C. After a while, S announces also to C that its receive window is now 0. An interchange of window probes is made now, one after other, by both hosts, spaced increasingly in time. No more data is effectively interchanged and processed, client and server applications do not progress. Both hosts are now in window probe deadlock. The deadlock is finally broken by exhausting (in hosts that implement a limit to retransmissions of window probes) the number of retransmissions of window probes (between 10 to 15 per host, that giving the back-off, means between 10 minutes and 1 hour) or by the alarm in the client side. The connection is aborted if such alarm was implemented, otherwise the deadlock continues forever. Appendix 2 contains a trace of a connection in "window probe deadlock" and subsequent abort by an alarm. Explanation of the Deadlock. To make easier the explanation, numeric parameters (but representative of actual figures) for the connection in both hosts are used as in this example: Freniche Informational [Page 3] Client C: application send buffer = 60000 bytes application recv buffer = 60000 bytes MTU is 1460 bytes TCP send buffer = 16384 TCP recv buffer = same as send buffer Server C: application send buffer = 8192 bytes application recv buffer = 8192 bytes MTU is 1460 bytes TCP send buffer = 16383 TCP recv buffer = same as send buffer The sequence in the communication is: 1 Client C opens the connection with the server S. 2 Client C issues a socket send with a buffer of 60000 bytes. 3 TCP in client C copies 16384 bytes from the application send buffer to the TCP send buffer and starts to send TCP several segments of 1460 bytes. As there are bytes in the application send buffer pending to be accepted by TCP, the client application is blocked. 4 TCP in server S receives the several segments, sends back the acknowledges and delivers data to the server application, in chunks with maximum size of 8192 bytes. The server program processes the data (in the case of echo, the processing is just a copy) and issues a socket send with 8194 bytes. TCP in the server accepts such data and sends it to the TCP client. 5 The TCP client receives the data sent by the server and keeps it in its receiver window. The client application is still blocked. 6 Steps 3, 4 and 5 continue. Evidently, as the client TCP continues sending/receiving data, the TCP receive buffer in C will be filled up. 7 Therefore, C starts to announce receive window 0. As the TCP send buffer in C still contains data, TCP C will continue sending data segments to S (with receive window 0). But the client application is still blocked, as not all bytes in the send request were accepted by the TCP C. 8 Obviously S continues receiving data segments (with window 0) from C. Such data is passed to the server application, which processes Freniche Informational [Page 4] it and send back to the client. But now such transmission is blocked by the server TCP (as the TCP client announced window 0). Eventually the TCP send buffer in S will be filled. The next send call issued by the application server wil block. 9 Finally, the TCP receive window in S will be filled up completely by the data segments received from S. 10 In this moment, both applications are blocked in their socket send calls and both TCPs have their receive windows completely filled up. Therefore both TCPs will start to send window probes, increasingly spaced in time, until end of retransmission attempts (if implemented) or alarm expiration. The connection is then aborted in this case, or else will run forever in "window probe deadlock". Conditions for the Deadlock. Clearly the first condition is that the client tries to send a huge amount of data in one or several consecutive socket send calls. Huge is understood here in comparison with the local TCP send buffer size. This last size must also be sufficient to produce a number of segments that start to fill up the peer TCP receive buffer size. To obtain the deadlock, server applications must be according with the "echo" pattern: they must read data from their local TCP in not too much large chunks (in comparison with the client's application send buffer size), process such data and finally respond to the client with large amount of data. The server's application send buffer must be of medium size, to allow that the TCP send buffer fills up it completely and the socket send call blocks. The server's TCP send buffer size must also allow for sufficient segments to fill the client TCP receive buffer. As noted, such conditions on TCPs send and receive buffer sizes are usually found in current TCP implementations. Same for send and receive buffer sizes of server applications. It is also no so unusual that client and server interchange large amount of data. The only non-usual condition is the client sending a huge amount of data, in one or several consecutives blocking send calls, before reading responses. Freniche Informational [Page 5] Other Factors. Given that that the root of the phenomenon is a "mechanical" blocking among the send/receive applications/TCPs buffers, it is evident that transmission media characteristics such as MTU and speed do not contribute, positive or negatively. However, TCP can be used on top of some transmission protocols (SMDS, ATM) that have larger MTU, and TCP send and receive windows are usually larger. This may mitigate or even avoid the deadlock. But on the other hand, such protocols are used to interchange large amount of data. Again, the driven condition is to send large amount of data, that will be processed and returned, without intermixing receive requests. By the same reason, host processing power is not a relevant factor. The deadlock is also independent of slow start/congestion avoidance as well from sender/received silly window avoidance and Nagle algorithms. Note that latest FreeBSDs use initially a large send congestion window in local networks but this last feature is not implemented in HP-UX. However, the deadlock was obtained in both type of hosts. The phenomenon is clearly dependent upon the send and receive buffer sizes of the client and server applications, as well of the send and receive buffer sizes of both TCPs. Modifying such parameters can solve the particular problem, but the possibility of deadlock and subsequent timeouts will be still there. This has been checked for several combinations of sizes. Just by adjusting conveniently the -m payload, the deadlock is obtained again. Proposed Solution. Only a blocking interface between the application and the TCP level is considered in this section (i.e., blocking sockets). If the conditions of the client/server are as described in previous paragraphs, and blocking sockets are used, there is a potential for deadlock. Such situation can be avoided by using other models in the client/server communication, such as non-blocking sockets or even using two connections, one for sending and other for receiving data (see Ref. 4 for a detailed discussion and implementation). However there is a solution (at TCP code level in the client) that Freniche Informational [Page 6] can impede the deadlock, even with blocking sockets. The application is blocked because there is not space in the TCP send buffer. But such buffer will continue completely filled up because the peer closed its receive window, and the peer application is also blocked in a send call by the same reason. One way to break the deadlock is to wake up the local application when: A The application send buffer is completely copied to TCP send buffer (this is the behavior already available) or else when ALL the following conditions hold: B-1 TCP send buffer full (this TCP cannot accept more data from the send call), B-2 TCP send window is 0 (this TCP cannot send data to peer, so the TCP send buffer will not be emptied), B-3 TCP receive buffer is full (this TCP cannot accept more data received), B-4 TCP has a pending blocking send request, If the four conditions B hold on one side, such part of the connection is blocked. But the peer status is as follows: peer TCP receive buffer is full (by condition B2); peer TCP will not send any more data (by condition B3). There is the potential that now the peer application issues a send call with more data than its local TCP send buffer can accommodate, therefore also blocking. Obviously, if the application is awakened in the side where conditions B hold, the deadlock is avoided. The solution implies at much three modifications to the TCP code that output a segment, to wake up the application when the conditions hold. The check must be done at much in three places: Place 1 When the retransmission timer expires and the TCP must transmit a window probe. Place 2 When a peer window probe is received and an acknowledge must Freniche Informational [Page 7] be transmitted. Place 3 When setting up the retransmission timer for window probe transmission. Places 1 and 2 are reactive solutions once the deadlock is present. Instead, last place is pro-active, acting immediately before the deadlock can occur. If the check succeeds, the application must be notified with the number of bytes accepted by the TCP or by a specific error in case of no bytes accepted. If the check is implemented only in Places 1 and 2 and only in the client TCP, the connection runs now to completion but with unnecessary delays: if the connection falls into the "window probe deadlock", the application client will be awakened after the first window probe timeout, which is about 3 seconds. If the check is implemented only in Places 1 and 2 and only in the server TCP, in addition to the previous unnecessary delay, the connection runs slowly after the first window probe deadlock is avoided. The reason is that the application server usually has a short to medium sized receive buffer (for example, 8194 for "inetd" built-in servers such as echo). Once forced to wake up, the server TCP announces a receive window of 8194, which is immediately filled by the client, and the deadlock occurs again until the server be awakened again by this modification (after 3 seconds), and so on. Modifying the TCP code just in Place 3 avoids such delays and it is clearly sufficient for all the cases. Testing a TCP level modified according to this section (just in Place 3), it was seen that now connections run correctly and smoothly to completion in the case of being a client, independently if the TCP server has also implemented it. On the other hand, if the TCP client did not implement the solution, deadlock may occur even if the TCP server has implemented it. This asymmetry is caused by the characteristics of the client and server: even if the server is notified, it has no other choice than read (until its memory resources are exhausted) and respond, causing now the deadlock. On the other hand, the client may now switch between sending and reading, avoiding the deadlock. A sample trace for a modified client and unmodified server is included in Appendix 3. The connection now runs to completion with no delays. Freniche Informational [Page 8] RFC ???? TCP Window Probe Deadlock July 1988 Impact on Current Applications. The new behavior is compatible with current applications, as the socket send specification (Ref. 5) number_of_accepted_bytes = send (socket, application_send_buffer_address, number_of_bytes_to_send, flags) already returned the number of bytes sent. If the socket is blocking, the caller will be blocked and only notified when the complete buffer has been accepted by the local TCP. In this case, it will return such number of bytes. The caller can also be notified when a send timeout expires (if such socket option was set, but not all TCP implementations provide such option). In this case it will return the effective number of bytes accepted (can be 0, in this last case the error EWOULDBLOCK is set). Therefore, the implications of the new behavior were already present in the interface provided by Sockets. Well-coded applications are therefore aware of such behavior, the proposed TCP modification will have no impact on them. There follows an example of code aware of this possibility: if ((n = send (s, *buf, strlen (buf), flags)) == strlen (buf)) { /* the complete buffer was sent */ ... } else if (n == -1) { /* some error detected, check errno */ if (errno == EWOULDBLOCK) { /* could not sent, proceed accordingly */ ... code for try again ... } else { /* serious error, proceed accordingly */ ... } } else { /* the buffer was sent partially */ ... code for sending the remaining part ... } Freniche Informational [Page 9] Note that it is the attempt of sending more data when the buffer was not completely sent, what can lead to the window probe deadlock described. If conditions are as described in previous sections, the deadlock may occur. If the TCP code is not modified, the application will remain blocked in a send call, until retransmissions expire (if implemented) or alarms expire (if used). No modification to the application will avoid the deadlock. If the TCP code is modified as proposed, a client application notified of a send call with the buffer not completely sent must not try to send again. Instead, it must replace the code for sending the remaining part by code for reading the first responses to what was already sent. There follows a pseudo-code for this (see also Appendix 1): adjust pointers to cover the whole buffer as a chunk to be sent loop until the whole buffer is sent send a chunk /* first time is all the buffer */ check status: if -1 then error and exit, except when errno is EWOULDBLOCK, continue. read response from server adjust the pointers to the next chunk end loop Security Considerations. The only security implication detected in this study is a "denial of service" attack to those hosts that do not implement a limit in the retransmission of window probes but that provide servers that send large amount of data to clients, in response to large amount of data sent by such clients. A representative server is the one implementing the echo protocol (Ref. 2). An attacker could then establish connections as described in this memo. As the server host will not abort the retransmission of window probes, the attacker will be able to waste resources in the server, as long as he maintains such connections. To avoid this attack, do implement a limit to the retransmission of window probes. The modifications proposed to the client TCP code level will avoid the deadlock in the client side, but no in the server side. Freniche Informational [Page 10] References. [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [2] Postel, J., "Echo Protocol", STD 20, RFC 862, May 1983. [3] Lawrence Berkeley National Laboratory, Network Research Group (tcpdump@ee.lbl.gov): ftp://ftp.ee.lbl.gov/tcpdump.tar.Z [4] Stevens, W. R.: Network Programming, Vol. 1, 2nd Ed., 1998, Prentice Hall. [5] IEEE, "Protocol Independent Interfaces", IEEE Std 1003.1g. Freniche Informational [Page 11] Author's Address. Juan L. Freniche Engineering Division Construcciones Aeronauticas (CASA) Getafe (SPAIN) Phone: + 34.91.624-2950 Fax: + 34.91.624-2705 EMail: jlfreniche@acm.org Freniche Informational [Page 12] Appendix 1: Listing of tcpecho.c #include #include #include #define SA struct sockaddr int s; /* socket descriptor */ void set_alarm (int duration) { struct itimerval inttimer; struct itimerval ointtimer; inttimer.it_interval.tv_sec = duration; inttimer.it_interval.tv_usec = 0; inttimer.it_value.tv_sec = duration; inttimer.it_value.tv_usec = 0; setitimer (ITIMER_REAL, &inttimer, &ointtimer); } void close_comms () { struct linger linger; linger.l_onoff = 1; linger.l_linger = 0; setsockopt (s, SOL_SOCKET, SO_LINGER, &linger, sizeof (linger)); close (s); } void timeout () { fprintf (stderr, "Connection timeout0); close_comms (); exit (1); } int process_by_tcp (char *remote_host, char *msg, int multiple, int times, int alarm_time) { struct hostent *hp; struct servent *sp; struct sockaddr_in peeraddr_in; int nbytes, i, pending_bytes; char echo_msg [multiple * strlen (msg)]; char echo_constant [multiple * strlen (msg)]; char *aux; memset ((char *) &peeraddr_in, 0, sizeof (struct sockaddr_in)); peeraddr_in.sin_family = AF_INET; Freniche Informational [Page 13] hp = gethostbyname (remote_host); if (hp == NULL) { fprintf (stderr, "tcpecho: %s not found0, remote_host); return -1; } peeraddr_in.sin_addr.s_addr = ((struct in_addr *) (hp->h_addr))->s_addr; sp = getservbyname ("echo", "tcp"); if (sp == NULL) { fprintf (stderr, "tcpecho: echo not found in /etc/services0); return -1; } peeraddr_in.sin_port = sp->s_port; s = socket (AF_INET, SOCK_STREAM, 0); if (s == -1) { fprintf (stderr, "tcpecho: Unable to create socket0); return -1; } set_alarm (alarm_time); if (connect (s, (SA *) &peeraddr_in, sizeof (struct sockaddr_in)) == -1) { set_alarm (0); fprintf (stderr, "tcpecho: Unable to connect to remote host %s0, remote_host); return -1; } set_alarm (0); echo_constant [0] = ' '; aux = echo_constant; for (i = 1; i <= multiple; i++) { strcpy (aux, msg); aux = aux + strlen (msg); } strcat (echo_constant, " "); for (i = 1; i <= times; i++) { nbytes = strlen (echo_msg); set_alarm (alarm_time); if (send (s, echo_msg, nbytes, 0) != nbytes) { fprintf (stderr, "tcpecho: Unable to send all bytes0); close_comms (); exit (1); } Freniche Informational [Page 14] pending_bytes = nbytes; while (pending_bytes > 0) { if ((nbytes = recv (s, echo_msg, pending_bytes, 0)) <= 0) { fprintf (stderr, "tcpecho: Error reading echo from server0); close_comms (); exit (1); } else { pending_bytes = pending_bytes - nbytes; echo_msg [nbytes] = ' '; } } } set_alarm (0); shutdown (s, 1); return 0; } void print_usage () { fprintf (stderr, "tcpecho: [-n times -a alarm -m multiple] remote_host string0); exit (1); } int main (int argc, char *argv[]) { int c; int times = 1; int alarm_time = 20; int status = 0; int multiple = 0; char *remote_host; if (argc <= 1) { print_usage (); } while ((c = getopt (argc, argv, "n:m:a:")) != -1) { switch (c) { case 'a': alarm_time = atoi(optarg); break; case 'n': times = atoi (optarg); if (times < 1) times = 1; break; case 'm': Freniche Informational [Page 15] multiple = atoi (optarg); if (multiple < 1) multiple = 0; break; } } if ((argc - optind) < 2) { print_usage (); } remote_host = argv [optind]; optind ++; signal (SIGINT, close_comms); signal (SIGALRM, timeout); status = process_by_tcp (remote_host, argv [optind], multiple, times, alarm_time); exit (status); } Freniche Informational [Page 16] Appendix 2: Trace of a Connection in Deadlock, Trace of a local connection where the phenomenon occurs. To reproduce it, enable the inetd echo service, compile tcpecho.c, launch a second xterm, and execute in it: localhost> tcpdump -N -p -i lo0 -s 128 -S Now, in the first xterm, execute (adjust conveniently the payload, observe the mss on the local interface): localhost> tcpecho -t -n 1 -a 120 -m 300000 localhost A The trace has been edited to remove some unnecessary fields and aligning the remaining. server> tcpdump -N -p -i lo0 -s 128 tcpdump: listening on lo0 4:26.637 1026 > echo: S 0:0(0) win 16384 4:26.637 echo > 1026: S 0:0(0) ack 1 win 57344 4:26.637 1026 > echo: . ack 1 win 57344 4:26.781 1026 > echo: P 1:2049(2048) ack 1 win 57344 4:26.782 1026 > echo: P 2049:16385(14336) ack 1 win 57344 4:26.782 1026 > echo: P 16385:30721(14336) ack 1 win 57344 4:26.783 1026 > echo: P 30721:45057(14336) ack 1 win 57344 4:26.784 echo > 1026: P 1:2049(2048) ack 45057 win 20480 4:26.784 1026 > echo: P 45057:57345(12288) ack 2049 win 55296 4:26.784 echo > 1026: P 2049:4097(2048) ack 57345 win 8192 4:26.785 1026 > echo: P 57345:59393(2048) ack 4097 win 53248 4:26.785 echo > 1026: P 4097:8193(4096) ack 59393 win 6144 4:26.785 1026 > echo: P 59393:61441(2048) ack 8193 win 49152 4:26.786 echo > 1026: P 8193:10241(2048) ack 61441 win 4096 4:26.787 echo > 1026: P 10241:24577(14336) ack 61441 win 20480 4:26.787 1026 > echo: . 61441:75777(14336) ack 24577 win 32768 4:26.788 echo > 1026: P 24577:26625(2048) ack 75777 win 14336 4:26.788 1026 > echo: . 75777:90113(14336) ack 26625 win 30720 4:26.788 echo > 1026: P 26625:28673(2048) ack 90113 win 0 4:26.790 echo > 1026: P 28673:43009(14336) ack 90113 win 16384 4:26.790 1026 > echo: . 90113:104449(14336) ack 43009 win 14336 4:26.790 echo > 1026: P 43009:45057(2048) ack 104449 win 2048 4:26.792 echo > 1026: . 45057:57345(12288) ack 104449 win 34816 4:26.792 1026 > echo: . 104449:118785(14336) ack 57345 win 0 4:26.792 1026 > echo: . 118785:133121(14336) ack 57345 win 0 4:26.794 echo > 1026: . ack 133121 win 38912 4:26.794 1026 > echo: . 133121:147457(14336) ack 57345 win 0 4:26.794 1026 > echo: P 147457:161793(14336) ack 57345 win 0 4:26.951 echo > 1026: . ack 161793 win 18432 4:26.951 1026 > echo: . 161793:176129(14336) ack 57345 win 0 Freniche Informational [Page 17] 4:27.151 echo > 1026: . ack 176129 win 4096 4:31.451 echo > 1026: . 57345:57346(1) ack 176129 win 4096 4:31.451 1026 > echo: . 176129:180225(4096) ack 57345 win 0 4:31.551 echo > 1026: . ack 180225 win 0 4:36.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 4:36.451 echo > 1026: . ack 180225 win 0 4:38.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 4:38.451 1026 > echo: . ack 57345 win 0 4:42.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 4:42.451 echo > 1026: . ack 180225 win 0 4:52.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 4:52.451 1026 > echo: . ack 57345 win 0 4:54.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 4:54.451 echo > 1026: . ack 180225 win 0 5:18.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 5:18.451 echo > 1026: . ack 180225 win 0 5:20.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 5:20.451 1026 > echo: . ack 57345 win 0 6:06.451 1026 > echo: . 180225:180226(1) ack 57345 win 0 6:06.451 echo > 1026: . ack 180225 win 0 6:16.451 echo > 1026: . 57345:57346(1) ack 180225 win 0 6:16.451 1026 > echo: . ack 57345 win 0 6:26.791 1026 > echo: R 180225:180225(0) ack 57345 win 0 The connection is aborted by the alarm used for socket operations. If such alarm is set sufficient high, all retransmissions of window probes could have been seen, and then again the reset. Freniche Informational [Page 18] Appendix 3: Trace of a Connection Solving the Deadlock. In a client that has implemented the modification, execute client> tcpecho -t -n 1 -a 120 -m 60000 server A server> tcpdump -N -p -i tun0 -s 128 tcpdump: listening on tun0 9:36.118 49152 > echo: S 0:0(0) win 11680 9:36.119 echo > 49152: S 0:0(0) ack 1 win 17520 9:36.121 49152 > echo: . ack 1 win 11680 9:36.130 49152 > echo: . 1:1461(1460) ack 1 win 11680 9:36.130 echo > 49152: P 1:1461(1460) ack 1461 win 17520 9:36.131 49152 > echo: . 1461:2921(1460) ack 1 win 11680 9:36.131 echo > 49152: P 1461:2921(1460) ack 2921 win 17520 9:36.132 49152 > echo: . 2921:4381(1460) ack 1 win 11680 9:36.132 echo > 49152: P 2921:4381(1460) ack 4381 win 17520 9:36.133 49152 > echo: . 4381:5841(1460) ack 1 win 11680 9:36.133 echo > 49152: P 4381:5841(1460) ack 5841 win 17520 9:36.134 49152 > echo: . 5841:7301(1460) ack 1 win 11680 9:36.134 echo > 49152: P 5841:7301(1460) ack 7301 win 17520 9:36.135 49152 > echo: . 7301:8761(1460) ack 1 win 11680 9:36.135 echo > 49152: P 7301:8761(1460) ack 8761 win 17520 9:36.136 49152 > echo: . 8761:10221(1460) ack 1 win 11680 9:36.136 echo > 49152: P 8761:10221(1460) ack 10221 win 17520 9:36.136 49152 > echo: P 10221:11681(1460) ack 1 win 11680 9:36.137 echo > 49152: P 10221:11681(1460) ack 11681 win 17520 9:36.139 49152 > echo: P 11681:13141(1460) ack 1461 win 10220 9:36.151 echo > 49152: . ack 13141 win 17520 9:36.168 49152 > echo: P 13141:14601(1460) ack 2921 win 8760 9:36.170 49152 > echo: P 14601:16061(1460) ack 4381 win 7300 9:36.171 echo > 49152: . ack 16061 win 17520 9:36.173 49152 > echo: P 16061:17521(1460) ack 5841 win 5840 9:36.175 49152 > echo: P 17521:18981(1460) ack 7301 win 4380 9:36.176 echo > 49152: . ack 18981 win 17520 9:36.178 49152 > echo: P 18981:20441(1460) ack 8761 win 2920 9:36.180 49152 > echo: P 20441:21901(1460) ack 10221 win 1460 9:36.180 echo > 49152: . ack 21901 win 17520 9:36.183 49152 > echo: P 21901:23361(1460) ack 11681 win 0 9:36.185 49152 > echo: P 23361:24821(1460) ack 11681 win 0 9:36.185 echo > 49152: . ack 24821 win 17520 9:36.188 49152 > echo: . 24821:26281(1460) ack 11681 win 0 9:36.188 49152 > echo: P 26281:27741(1460) ack 11681 win 0 9:36.189 echo > 49152: . ack 27741 win 17520 9:36.191 49152 > echo: . 27741:29201(1460) ack 11681 win 0 9:36.191 49152 > echo: P 29201:30661(1460) ack 11681 win 0 9:36.192 echo > 49152: . ack 30661 win 17520 9:36.194 49152 > echo: . 30661:32121(1460) ack 11681 win 0 Freniche Informational [Page 19] 9:36.194 49152 > echo: P 32121:33581(1460) ack 11681 win 0 9:36.196 49152 > echo: . 33581:35041(1460) ack 11681 win 0 9:36.197 49152 > echo: P 35041:36501(1460) ack 11681 win 0 9:36.199 49152 > echo: . 36501:37961(1460) ack 11681 win 0 9:36.200 49152 > echo: P 37961:39421(1460) ack 11681 win 0 9:36.202 49152 > echo: . 39421:40881(1460) ack 11681 win 0 9:36.202 49152 > echo: P 40881:42341(1460) ack 11681 win 0 9:36.351 echo > 49152: . ack 42341 win 5840 9:36.353 49152 > echo: . 42341:43801(1460) ack 11681 win 0 9:36.354 49152 > echo: . 43801:45261(1460) ack 11681 win 0 9:36.355 49152 > echo: . 45261:46721(1460) ack 11681 win 0 9:36.355 49152 > echo: . 46721:48181(1460) ack 11681 win 0 9:36.551 echo > 49152: . ack 48181 win 0 9:36.554 49152 > echo: . ack 11681 win 11680 9:36.554 echo > 49152: . 11681:13141(1460) ack 48181 win 0 9:36.554 echo > 49152: . 13141:14601(1460) ack 48181 win 0 9:36.555 echo > 49152: . 14601:16061(1460) ack 48181 win 0 9:36.555 echo > 49152: . 16061:17521(1460) ack 48181 win 0 9:36.555 echo > 49152: . 17521:18981(1460) ack 48181 win 0 9:36.555 echo > 49152: . 18981:20441(1460) ack 48181 win 0 9:36.555 echo > 49152: . 20441:21901(1460) ack 48181 win 0 9:36.555 echo > 49152: . 21901:23361(1460) ack 48181 win 0 9:36.558 49152 > echo: . ack 14601 win 8760 9:36.559 echo > 49152: . ack 48181 win 8192 9:36.645 49152 > echo: . ack 17521 win 5840 9:36.655 49152 > echo: . ack 20441 win 2920 9:36.658 49152 > echo: . ack 23361 win 0 9:36.659 echo > 49152: . ack 48181 win 16384 9:36.661 49152 > echo: . 48181:49641(1460) ack 23361 win 0 9:36.661 49152 > echo: . 49641:51101(1460) ack 23361 win 0 9:36.662 49152 > echo: . 51101:52561(1460) ack 23361 win 0 9:36.663 49152 > echo: . 52561:54021(1460) ack 23361 win 0 9:36.663 49152 > echo: . 54021:55481(1460) ack 23361 win 0 9:36.665 49152 > echo: . 55481:56941(1460) ack 23361 win 0 9:36.666 49152 > echo: . 56941:58401(1460) ack 23361 win 0 9:36.667 49152 > echo: P 58401:59861(1460) ack 23361 win 0 9:36.669 49152 > echo: . ack 23361 win 11680 9:36.669 echo > 49152: . 23361:24821(1460) ack 59861 win 4704 9:36.669 echo > 49152: . 24821:26281(1460) ack 59861 win 4704 9:36.669 echo > 49152: . 26281:27741(1460) ack 59861 win 4704 9:36.669 echo > 49152: . 27741:29201(1460) ack 59861 win 4704 9:36.669 echo > 49152: . 29201:30661(1460) ack 59861 win 4704 9:36.669 echo > 49152: . 30661:32121(1460) ack 59861 win 4704 9:36.669 echo > 49152: . 32121:33581(1460) ack 59861 win 4704 9:36.670 echo > 49152: . 33581:35041(1460) ack 59861 win 4704 9:36.682 49152 > echo: . ack 26281 win 8760 9:36.697 49152 > echo: . ack 29201 win 5840 9:36.700 49152 > echo: . ack 32121 win 2920 Freniche Informational [Page 20] 9:36.701 echo > 49152: . ack 59861 win 12896 9:36.704 49152 > echo: . ack 35041 win 0 9:36.707 49152 > echo: . ack 35041 win 11680 9:36.707 echo > 49152: . 35041:36501(1460) ack 59861 win 12896 9:36.707 echo > 49152: . 36501:37961(1460) ack 59861 win 12896 9:36.707 echo > 49152: . 37961:39421(1460) ack 59861 win 12896 9:36.708 echo > 49152: . 39421:40881(1460) ack 59861 win 12896 9:36.708 echo > 49152: . 40881:42341(1460) ack 59861 win 12896 9:36.708 echo > 49152: . 42341:43801(1460) ack 59861 win 12896 9:36.708 echo > 49152: . 43801:45261(1460) ack 59861 win 12896 9:36.708 echo > 49152: . 45261:46721(1460) ack 59861 win 12896 9:36.757 49152 > echo: . ack 37961 win 8760 9:36.758 echo > 49152: . ack 59861 win 17520 9:36.761 49152 > echo: . ack 40881 win 5840 9:36.764 49152 > echo: . ack 43801 win 2920 9:36.767 49152 > echo: . ack 46721 win 0 9:36.788 49152 > echo: . ack 46721 win 11680 9:36.788 echo > 49152: . 46721:48181(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 48181:49641(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 49641:51101(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 51101:52561(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 52561:54021(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 54021:55481(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 55481:56941(1460) ack 59861 win 17520 9:36.788 echo > 49152: . 56941:58401(1460) ack 59861 win 17520 9:36.792 49152 > echo: . ack 49641 win 8760 9:36.795 49152 > echo: . ack 52561 win 5840 9:36.798 49152 > echo: . ack 55481 win 2920 9:36.801 49152 > echo: . ack 58401 win 0 9:36.803 49152 > echo: . ack 58401 win 11680 9:36.803 echo > 49152: P 58401:59861(1460) ack 59861 win 17520 9:36.807 49152 > echo: P 59861:60001(140) ack 59861 win 11680 9:36.808 echo > 49152: P 59861:59961(100) ack 60001 win 17520 9:37.131 49152 > echo: . ack 59961 win 11680 9:37.131 echo > 49152: P 59961:60001(40) ack 60001 win 17520 9:37.135 49152 > echo: F 60001:60001(0) ack 60001 win 11680 9:37.135 echo > 49152: . ack 60002 win 17520 9:37.136 echo > 49152: F 60001:60001(0) ack 60002 win 17520 9:37.153 49152 > echo: . ack 60002 win 11679 After the client sends a window probe, its application is awakened and reads the complete receive buffer (11680, at time 9:36.554). A window advertisement is send to the server, inviting it to respond with more data, avoiding the deadlock. Freniche Informational [Page 21] Full Copyright Statement. Copyright (C) The Internet Society (1998). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." INTERNET DRAFT EXPIRES JANUARY 1999 INTERNET DRAFT