M. Mellia M. Meo Internet Draft C. Casetti Document: draft-mellia-tsvwg-tcp-smartframing-00.txt Politecnico di Torino Expires: April 2002 November 2001 TCP Smart-Framing 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 2. Abstract In this document we present an enhancement to the TCP protocol, called TCP Smart Framing, or TCP-SF for short, that enables the Fast Retransmit/Recovery algorithm even when the congestion window is small, i.e., for short-lived flows, as most of the current Internet traffic is. Without modifying the TCP congestion control based on the additive-increase/multiplicative-decrease paradigm, TCP-SF adopts a novel segmentation algorithm: while Classic TCP starts sending IW segments, a TCP-SF source is allowed to always send an initial window of 4 smaller segments, whose aggregate payload is equal to the connection's SMSS. This key idea can be implemented on top of any TCP flavor, from Tahoe to SACK, and requires modifications to the server behavior only. Table of Contents 1. Status of this Memo ............................................1 Mellia et al. Expires April 2002 [Page 1] TCP Smart Framing October 2001 2. Abstract .......................................................1 3. Terminology ....................................................3 4. Definitions ....................................................3 5. Introduction ...................................................3 6. TCP modifications ..............................................4 7. Advantages of Smart Framing ....................................6 8. Disadvantage of Smart Framing ..................................7 9. Simulation and experimental results ............................7 10. Security Considerations .....................................10 11. Conclusions .................................................10 12. References ..................................................11 Mellia et al. Expires April 2002 [Page 2] TCP Smart Framing October 2001 3. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", AND "OPTIONAL" are to be interpreted as described in RFC 2119 [1] and indicate requirement levels for protocols. 4. Definitions This document assumes that the reader is familiar with the terms and definitions contained in [2]. 5. Introduction In this document, we propose a new approach to data segmentation in the early stages of Slow Start that addresses the nature of today's Internet traffic: short, spotty client-server interactions between a Web client and a Web server. We will refer to this variant of TCP as ``TCP Smart Framing'', or TCP-SF for short. Unless otherwise specified, by ``Classic'' TCP we refer to any TCP version currently implemented in standard TCP stacks (i.e., TCP Reno [3], TCP NewReno [4], TCP SACK [5]). We advocate an increase in the number of segments transmitted by a TCP source, without increasing the amount of application data actually sent in the congestion window. This will be done whenever the congestion window is "small", i.e., at the beginning of each Slow Start phase, and in particular at flow startup. The main observation is that TCP's congestion control is only marginally driven by the rate at which bytes leave the source but, rather, by the rate at which segments (and their respective ACKs) are sent (or received) at the source. TCP infers that a segment is lost whenever one of the following two events occurs: a Retransmission Time Out (RTO) expiration, or the arrival of three duplicate ACKs that triggers the Fast Retransmit (FR) algorithm. Of these two events, RTO is the least desirable one as the RTO period is usually much larger than the Round Trip Time (RTT) (at least 200 ms, to account for Delayed ACKs). Indeed, regardless of the actual amount of bytes transmitted, a coarse RTO expiration can be prevented only if enough segments are sent in the transmission window (i.e., at least three more following the lost segment). This situation can occur only if i) the congestion window is larger than or equal to 4 SMSS (Sender Maximum Segment Size) and ii) if the flow is long enough to allow the transmission of at least 4 back-to-back segments (i.e., it is not a so-called short-lived flow). Also, it should be pointed out that repeatedly forcing a short-lived connection into RTO often results in excessive penalty for the connection itself, that would otherwise end after sending few more segments, rather than in actual network decongestion. Since today's Internet traffic is heavily represented by short-lived connections, Mellia et al. Expires April 2002 [Page 3] TCP Smart Framing October 2001 the need is felt to address their requirements in the design of TCP's congestion control. TCP-SF is based on the following idea: increasing the upstream flow of ACKs by sending downstream a larger number of segments whose size is smaller than the SMSS. While maintaining unchanged the amount of data injected into the pipe, a larger number of segments received at the other end triggers a larger number of ACKs in the backward channel and thus a larger probability that the transmitter can recover losses without waiting for the RTO to expire. While Classic TCP starts sending IW (Initial Window) segments, in our scheme a TCP-SF source is allowed to send 4 segments, whose aggregate payload is equal to IW*SMSS. Also, while Classic TCP restarts sending LW (Loss Window) segments after an RTO, TCP-SF sends 4 segments, whose payload again equals LW*SMSS. Thus, the resulting network load is, bytewise, the same of a Classic TCP connection with IW = LW = 1 (except for the segmentation overhead). TCP-SF resumes the classic behavior, i.e. sends full-sized segments, when a threshold equal to 4*SMSS is reached. The Classic TCP algorithms (Slow Start, Congestion Avoidance, Fast Retransmit, Fast Recovery) are not otherwise affected. Similar proposals address the RTO penalty. TCP Limited Transmit [6] allows the transmission of new segments upon the reception of the first or second duplicate ACK, so as to enhance the chances of triggering FR if the window is particularly small. It should be noted that, compared to our proposal, TCP Limited Transmit cannot trigger FR for flows shorter than 4*SMSS; also, at least one more RTT is required before FR can be entered. [7] suggests an increase in the permitted upper bound on IW (but not on the LW) from 1 segment to between 2 and 4 segments, without changing the segment size; this approach can help avoid RTOs in the initial window, but could potentially increase the network congestion and, acting more aggressively, it could affect other TCP sources not employing this algorithm, as the authors themselves point out. 6. TCP modifications TCP-SF aims at enhancing TCP's behavior in the operating region where RTO is the only way to recover losses (i.e., when cwnd<4*SMSS) making FR possible, as for example at the beginning of each Slow Start phase. In this document, the source states when the cwnd values are smaller than 4*SMSS will be referred to as `small- window regime'. We define cwnd0 as the initial congestion window size in bytes at the beginning of a Slow Start phase, i.e., either IW or LW. We consider two possible implementations: * Fixed-Size (FS-) TCP-SF. The largest segment size can either be equal to SMSS (full size) Mellia et al. Expires April 2002 [Page 4] TCP Smart Framing October 2001 or to SMSS*cwnd0/4 (reduced size); specifically, when cwnd<4*SMSS, the sender MUST use reduced-size segments; otherwise, it MUST use full-size segments. The cwnd increase rule follows the same principles as in Classic TCP, although the different largest segment size is accounted for: upon ACK reception during Slow Start, the cwnd increase law is either cwnd = cwnd + SMSS*cwnd0/4 when the source is in the small-window regime, or: cwnd = cwnd + SMSS otherwise; upon ACK reception during Congestion Avoidance, the cwnd increase law is either: cwnd = cwnd + (SMSS*cwnd0/4)*(SMSS*cwnd0/4)/cwnd when the source is in the small-window regime, or: cwnd = cwnd + SMSS*SMSS/cwnd otherwise. This ensures that the cwnd increase is, bytewise, identical to the one employed by Classic TCP. * Variable-Size (VS-) TCP-SF. At the beginning the segments size MUST be set equal to SMSS*cwnd0/4; then, while cwnd<4*SMSS, the segment size MUST be increased by a factor k upon ACK receptions, until the segment size is equal to or larger than SMSS. It should be noted that, while cwnd<4*SMSS, the sender MUST NOT have more than 4 in-flight segments. The value of k can be determined by imposing that the amount of data sent by the TCP-SF in the small window regime is equal to the one that a Classic TCP would send in the same region, with the constraint that segments cannot be larger than SMSS. After some calculations, we obtain k^5 = 4 Thus k ~= 1.32. This translates into the following sample cwnd growth (assuming cwnd0 = 1 and no losses): +--------------+---------------------+--------------------------+ | segment no. | segment size | cwnd | +--------------+---------------------+--------------------------+ | 1,2,3,4 | SMSS/4 | SMSS | | 5 | SMSS/4*k | SMSS/4*(3+k) | | 6 | SMSS/4*k^2 | SMSS/4*(2+k+k^2) | | 7 | SMSS/4*k^3 | SMSS/4*(1+k+k^2+k^3) | | 8 | SMSS/4*k^4 | SMSS/4*(k+k^2+k^3+k^4) | | 9 | SMSS/4*k^5=SMSS | SMSS/4*(k^2+k^3+k^4+k^5) | | 10 | SMSS | SMSS/4*(k^3+k^4+k^5)+SMSS| | 11 | SMSS | SMSS/4*(k^4+k^5)+2*SMSS | | 12 | SMSS | SMSS*4 | | 13 | SMSS | SMSS*4+SMSS | +--------------+---------------------+--------------------------+ The first column indicates the segment number sent by the source, the second column shows the segment size in bytes, and the last column shows the value of the cwnd in bytes when the corresponding Mellia et al. Expires April 2002 [Page 5] TCP Smart Framing October 2001 segment is being sent. The cwnd growth rule outlined above exhibits a bytewise exponential growth such as the one that is followed in Slow Start. However, if ssthresh < 4*SMSS, this growth is supposed to become linear, according to the Congestion Avoidance algorithm. In our case, shifting from exponential to linear growth would result in an excessively complicated implementation of the growth rule. Therefore, we suggest that the Congestion Avoidance growth rule be applied only when cwnd >= 4*SMSS, i.e., when Classic TCP behavior is entered. Besides, given the small value of cwnd, the impact of the different window growth is minimal. One advantage of using TCP-SF with fixed-size segments relies in its simplicity: only two segment sizes are possible, either SMSS or SMSS*cwnd0/4. However, the overhead introduced increases with cwnd. On the contrary, when using variable-size segments, TCP-SF keeps the overhead constant but a more complex implementation is required to deal with variable-size segments. Finally, it should be noted that resegmentation can occur when the 4*SMSS threshold is crossed downwards upon segment loss detection and cwnd reduction. Indeed, whenever cwnd < 4*SMSS, the sender MUST use smaller segments (according to the rules outlined above). 7. Advantages of Smart Framing Let us point out and summarize some advantages related to the implementation of TCP-SF. * The lengthy first-window RTO (set to 3 seconds) is no longer the only outcome if a loss occurs at the onset of a connection. * When Delayed ACKs are employed and the congestion window is 1 segment large, the receiver has not to wait for 200 ms before generating an ACK; several current TCP implementation start a connection with a window of 2 segments, a widely-employed workaround to the Delayed ACK initial slowdown. It should be noted that, if Delayed ACKs are implemented, TCP-SF can trigger Fast Recovery as soon as the receiver disables the Delayed-ACK feature upon reception of out of order segments [3]. * The RTT estimate, which is updated upon the reception of every ACK, and is used to set the retransmission timer, improves its accuracy early on, thanks to the increased number of returning ACKs in the first window already; * Short-lived flows, for which the completion time is paramount, are less likely to experience a coarse RTO expiration, since the number of transmitted segments grants a bigger chance of triggering FR; * Shorter segments can exploit pipelining transmission, completing the transfer in a shorter time because of the store- and-forward mechanism at the routers; this is especially useful in slow links; Mellia et al. Expires April 2002 [Page 6] TCP Smart Framing October 2001 * Not requiring any contribution from the receiver, the scheme can quite easily be deployed on a server-only basis; furthermore, it can equally benefit well-established Classic TCP flavors, such as TCP Reno, NewReno, SACK, and works coupled with ECN (Early Congestion Notification). The implementation of TCP-SF is extremely simple. It requires to modify the transmitter behavior only while maintaining the receiver unchanged. This modification translates into a few lines of code in the TCP stack. * The degree of aggressiveness of TCP-SF is the same as other classical versions of TCP. Indeed, the evolution of cwnd as well as the amount of data submitted to the network are unchanged. * In case of segment drops, TCP-SF is more efficient as regards bandwidth utilization, both in terms of wasted bandwidth resulting from undelivered segments, and in terms of retransmission of smaller segments. 8. Disadvantage of Smart Framing Let us now summarize some disadvantages related to the implementation of TCP-SF. * The main disadvantage is that TCP-SF increases the overhead of a factor equal to the segment size reduction factor; i.e., using four segments per SMSS, the TCP-SF overhead is four times larger than the Classic TCP overhead. In particular, when no losses occur, FS-TCP-SF will send 28 small-size segments before switching back to large-size segments. VS-TCP-SS, on the contrary, sends 12 segments. Instead, Classic TCP always sends 7 segments. When comparing Classic TCP, FS-TCP-SF, VS-TCP-SF, in the small small-window regime, we observe a protocol overhead efficiency of 2.6%, 4.5% and 10.6%, respectively. * It should also be pointed out that a larger number of segments can nominally slow down router operations. * The more complex implementation of TCP-SF can result in higher computational load at the sender side. 9. Simulation and experimental results We have chosen to investigate the performance of TCP-SF using both simulation and actual testbed measurements. Simulation gave us full control over specific scenarios. On the other hand, the testbed implementation allowed a more realistic evaluation featuring actual traffic patterns. A larger set of simulation and experimental results, including friendliness results, can be found in [8] and [9]. We implemented both flavors of TCP-SF in the ns-2 simulator. For our testbed measurements, we just implemented the Fixed-Size version in the Linux kernel 2.2.17. Mellia et al. Expires April 2002 [Page 7] TCP Smart Framing October 2001 We report results for network scenario in which both long-lived FTP- file transfer and Web-like connections share a common bottleneck link. In particular we derived the flow length distribution from a AT&T Labs' traffic estimate that can be found in [10]. As can be seen, more than 80% of the flows are shorter than 11 Kbytes. To model Web-like traffic, TCP-SACK sources (and receivers) are connected to a bottleneck link of 10 Mbit/s capacity and 50 ms delay. A Poisson process drives the setup of new connections, whose length (in bytes) is randomly set according to the AT&T traffic distribution. The SMSS is set to 1,460 bytes. Long-lived connections are accounted for using 10 greedy TCP sources. The bottleneck link is managed by a bytewise Droptail buffer, whose capacity is set to 150 Kbytes. To get rid of transient effects, the simulation time is 4,000 seconds. The following Table reports, for different flow lengths, the average completion time in seconds (CT) and the percentage of FR instances used to detect a segment loss for Classic TCP, VS- and FS-TCP-SF. The offered load was fixed and equal to 0.8. It should be noted that, for some flow lengths, FR is not observed since there are not enough segments to trigger it. +-------+------+-------+-------+-------+------+-------+ | flow | Classic | VS-TCP-SF | FS-TCP-SF | |length | CT | %FR | CT | %FR | CT | %FR | +-------+------+-------+-------+-------+------+-------+ | 61 | 0.25 | 0.0 | 0.22 | 0.0 | 0.24 | 0.0 | | 239 | 0.37 | 0.0 | 0.28 | 0.0 | 0.31 | 0.0 | | 539 | 0.46 | 0.0 | 0.34 | 0.0 | 0.43 | 0.0 | | 1349 | 0.81 | 0.0 | 0.42 | 8.02 | 0.49 | 10.84 | | 2739 | 1.14 | 0.0 | 0.53 | 33.52 | 0.64 | 35.57 | | 4149 | 1.33 | 0.0 | 0.54 | 43.26 | 0.86 | 32.00 | | 6358 | 1.71 | 0.0 | 0.77 | 52.12 | 0.93 | 30.02 | | 10910 | 1.96 | 5.54 | 1.00 | 64.78 | 1.16 | 37.26 | | 18978 | 2.17 | 21.91 | 1.23 | 62.98 | 1.57 | 49.01 | | 90439 | 4.43 | 46.34 | 3.01 | 69.68 | 3.87 | 64.64 | +-------+------+-------+-------+-------+------+-------+ As can be seen, both flavors of TCP-SF outperform the Classic TCP performance in terms of completion time for all flow lengths. Specifically, different completion times are observed, even if the RTO and FR percentage are the same, because each scenario exhibits a slightly different loss probability. For example, although all smaller-length flows do not benefit from the TCP-SF enhancement, their completion times differ because dropping probabilities for TCP-SF were slightly smaller. In particular, the above Table underlines the benefits obtained by TCP-SF in increasing the number of FR. The benefits are clearly visible starting from flows that have to send 1,349 bytes: using TCP-SF, they manage to trigger the FR algorithm. On the contrary, the first class of flows that can use FR for the Classic TCP is the one that has to send 10,910 bytes, where only less than 6% of dropped segments trigger FR. This is reflected by a smaller Mellia et al. Expires April 2002 [Page 8] TCP Smart Framing October 2001 completion time required to successfully end the transfer (i.e., reduced by half). The second set of testbed experiments involved a realistic traffic pattern (i.e., Web client-server transactions) routed over a link emulator between clients and a Proxy Server. In order to generate realistic traffic, every Web browser in our Department Subnet was configured so as to use the local Proxy Server. TCP-SF was only implemented on the machine running the Proxy Server, and the link emulator was added on the return path between the Proxy and the Department Subnet. The link emulator was configured so as to enforce a specific latency and byte-wise drop probability (i.e., longer packets have a higher probability of being dropped than smaller ones). The above configuration allows a substantial amount of traffic (namely, the Web objects fetched by the Proxy and returned to the requesting clients on the Department Subnet) to be sent over the link emulator using TCP-SF as transport layer. Performance metrics were collected for different values of emulated latency and drop probability, over a period of one month in June 2001; in order to collect a meaningful set of data, each latency/drop pair was set for a whole day, and the Proxy Server had its transport layer switched between Classic TCP and TCP-SF every five minutes. Only connections between the Proxy server and a local client were considered. Statistics were later collected for each version of TCP, and for each day. Unlike simulation results, we had no control over the actual client sessions: the amount of data transferred during each transaction depended on the browser used and the operating system installed on each user's machine. In the following Table, we report, for different drop probabilities and latency values enforced by the link emulator, the estimate of the average Retransmission Timer (RT) per connection and percentage of times a loss resulted in a Fast Recovery (%FR), for both Classic TCP and FS-TCP-SF. The Table only reports results for flows shorter than 10 kBytes, which are the majority of the observed flows. Complete results may be found in [9]. +------+----------+----------------+----------------+ | drop | latency | Classic TCP | TCP-SF | | rate | [ms] | RT [ms] | %FR | RT [ms] | %FR | +------+----------+---------+------+---------+------+ | 0.01 | 20 | 435 | 36.8 | 307 | 82.5 | | | 50 | 667 | 35.3 | 392 | 85.0 | | | 100 | 824 | 48.7 | 681 | 86.5 | | 0.05 | 20 | 482 | 18.6 | 333 | 72.2 | | | 50 | 669 | 36.1 | 429 | 79.7 | | | 100 | 1206 | 40.3 | 652 | 84.0 | | 0.10 | 20 | 477 | 19.5 | 397 | 76.0 | | | 50 | 725 | 20.3 | 428 | 87.6 | | | 100 | 1008 | 30.5 | 661 | 87.7 | +------+----------+---------+------+---------+------+ Mellia et al. Expires April 2002 [Page 9] TCP Smart Framing October 2001 The results shown in the Table above point out the drastic increase in terms of percentage of Fast Recovery for TCP-SF when compared to Classic TCP; also, the Retransmission Timer is significantly lower for TCP-SF. As a side note, the estimation of the proper value of the Retrasmission Timer also benefits from the features of TCP-SF: the larger number of segments sent, compared to Classic TCP, accounts for a larger number of samples used in the estimation, thus refining the estimate and providing a smaller, more accurate value for the timer. The combined effects of fewer RTOs and smaller values of the retransmission timer shortens the completion time. 10. Security Considerations This document discusses a new framing algorithm that can be used to improve TCP performance in the small-window regime. This does not raise any new security issues with TCP. 11. Conclusions This document proposes a small change to the TCP protocol that is beneficial to short-lived TCP connections by allowing the triggering of the Fast Recovery algorithm from the onset of a connection. Mellia et al. Expires April 2002 [Page 10] TCP Smart Framing October 2001 Authors' Addresses Marco Mellia Politecnico di Torino C.so Duca degli Abruzzi 24 Phone: 39-011-5644173 Torino, Italy Email: mellia@polito.it Michela Meo Politecnico di Torino C.so Duca degli Abruzzi 24 Phone: 39-011-5644167 Torino, Italy Email: michela@polito.it Claudio Casetti Politecnico di Torino C.so Duca degli Abruzzi 24 Phone: 39-011-5644126 Torino, Italy Email: casetti@polito.it 12. References [1] S.Bradner, Key words for use in RFCs to Indicate Requirement Levels, BCP 14, RFC 2119, March 1997. [2] W.Stevens, M.Allman, V.Paxson, TCP Congestion Control, RFC 2581, April 1999. [3] W.Stevens. TCP/IP Illustrated, vol. 1. Addison Wesley, Reading, MA, USA, 1994. [4] S.Floyd, T. Henderson, The NewReno Modification to TCP's Fast Recovery Algorithm, RFC 2582, Apr. 1999 [5] M.Mathis, J.Mahdavi, S.Floyd, A.Romanow, TCP Selective Acknowledgement Options, RFC 2018, Apr. 1996 [6] M.Allman, H.Balakrishnan, S.Floyd, Enhancing TCP's Loss Recovery Using Limited Transmit. RFC 3042, Jan. 2001 [7] M. Allman, S. Floyd, C. Partridge, Increasing TCP's Initial Window, RFC 2414, Sep. 1998 [8] M. Mellia, C. Casetti, M. Meo, TCP Smart Framing: Using Smart Segments to Enhance the Performance of TCP, Globecom 2001, San Antonio, Texas, November, 25-29 2001 [9] M. Mellia, M. Meo, C. Casetti, TCP Smart Framing: a New, Exciting Addition to the TCP Universe, submitted to IEEE Infocom 2002 [10] A. Feldmann, J. Rexford, R, Caceres, Efficient policies for carrying Web traffic over flow-switched networks. IEEE/ACM Transactions on Networking, Vol: 6, No: 6, pp. 673-685,Dec. 1998 Mellia et al. Expires April 2002 [Page 11]