Internet Engineering Task Force                              Mark Allman
INTERNET DRAFT                              NASA Lewis/Sterling Software
File: draft-floyd-incr-init-win-01.txt                       Sally Floyd
                                                                     LBL
                                                         Craig Partridge
                                                        BBN Technologies    
                                                             March, 1998
                                                Expires: September, 1998
    
    
                    Increasing TCP's Initial Window
    

Status of this Memo
                                    
    This document is an Internet-Draft.  Internet-Drafts are working
    documents of the Internet Engineering Task Force (IETF), its areas,
    and its working groups.  Note that other groups may also distribute
    working documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as ``work in
    progress.''

    To learn the current status of any Internet-Draft, please check the
    ``1id-abstracts.txt'' listing contained in the Internet- Drafts
    Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
    munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
    ftp.isi.edu (US West Coast).

Abstract
    
    This is a note to suggest changing the permitted initial window in
    TCP from 1 segment to roughly 4K bytes.  This draft considers the
    advantages and disadvantages of such a change, as well as outlining
    some experimental results that indicate the costs and benefits of
    making such a change to TCP, and pointing out remaining research
    questions.

1.  TCP Modification

    This draft suggests allowing the initial window used by a TCP
    connection to increase from 1 segment to between 2 and 4 segments.
    In most cases, this will result in an initial window of roughly 4K
    bytes (although given a large segment size, the initial window could
    be significantly larger than 4K bytes).  The proposed initial window
    size is given in (1):

                  min (4*MSS, max (2*MSS, 4380 bytes))               (1)

    
Allman                                                          [Page 1]

                                                              March 1998

    Or, more specifically the initial window size is based on the
    maximum segment size (MSS), as follows:

    	MSS <= 1095 bytes:
	    win = 4 * MSS
	1095 bytes < MSS < 2190 bytes:
	    win = 4380
	MSS => 2190 bytes:
	    win = 2 * MSS

    This increased initial window would be optional: that a TCP MAY
    start with a larger initial window, not that it SHOULD.

    This change would only apply to the initial window of the
    connection, in the first round trip time (RTT) of transmission
    following the TCP three-way handshake.  That is, the SYN/ACK in the
    three way handshake should not increase the initial window size
    above that outlined in equation (1).  However, if the SYN or SYN/ACK
    is lost the initial window used after a correctly transmitted SYN
    MUST be 1 segment.

    Some TCP implementations use slow start to re-start transmission
    after a long idle period.  In this case, the initial window used
    should be the same as the initial window used at the beginning of
    the transfer.  The change proposed in this document would not change
    the behavior after a retransmit timeout, when the sender would
    continue to slow start from an initial window of one segment.

2.  Advantages of Larger Initial Windows

    1.  For connections transmitting only a small amount of data, a
        larger initial window would reduce the transmission time
        (assuming moderate segment drop rates).  For many email (SMTP
        [Pos82]) and web page (HTTP [BLFN96, FJGFBL97]) transfers that
        are less than 4K bytes, the larger initial window would reduce
        the data transfer time to a single RTT.

    2.  For connections that will be able to use large congestion
        windows, this modification eliminates up to three RTTs and a
        delayed ACK timeout during the initial slow-start phase.  This
        would be of particular benefit for high-bandwidth
        large-propagation-delay TCP connections, such as those over
        satellite links.

    3.  When the initial window is 1 segment, a receiver employing
        delayed acknowledgments (ACK) [Bra89] is forced to wait for a
        timeout before generating an ACK.  With a larger initial window,
        the receiver will be able to generate an ACK after the second
        data segment arrives.  This eliminates the need to wait on the
        timeout (0.1 seconds, or more).

    
Allman                                                          [Page 2]

                                                              March 1998

3.  Implementation Issues

    When larger initial windows are implemented along with Path MTU
    Discovery [MD90], only one of the segments in the initial window
    should have the "Don't Fragment" (DF) bit set.  Preliminary analysis
    indicates that setting the DF bit in the last segment in the initial
    window provides the least chance for needless retransmissions and
    large line-rate bursts of segments when the initial segment size is
    found to be too large.  In addition, if the MSS being used is found
    to be too large, the cwnd should be reduced to prevent large bursts
    of smaller segments.  Specifically, cwnd should be reduced by the
    ratio of the old segment size to the new segment size.  However,
    more attention needs to be paid to the interaction between larger
    initial windows and Path MTU Discovery.

    The larger initial window proposed in this document SHOULD NOT be
    viewed as an encouragement for web browsers to open multiple
    simultaneous TCP connections all with larger initial windows.  (Web
    browsers should not open four simultaneous TCP connections to the
    same destination in any case, because this works against TCP's
    congestion control mechanisms [FF98]).

4.  Disadvantages of Larger Initial Windows for the Individual
    Connection 

    In high-congestion environments, particularly for routers that have
    a bias against bursty traffic (as in the typical Drop Tail router
    queues), a TCP connection can sometimes be better off starting with
    an initial window of one segment.  There are scenarios where a TCP
    connection slow-starting from an initial window of one segment might
    not have segments dropped, while a TCP connection starting with an
    initial window of four segments might experience unnecessary
    retransmits due to the inability of the router to handle small
    bursts.  This could result in an unnecessary retransmit timeout.
    For a large-window connection that is able to recover without a
    retransmit timeout, this could result in an unnecessarily-early
    transition from the slow-start to the congestion-avoidance phase of
    the window increase algorithm.  These premature segment drops should
    not happen in uncongested networks, or in moderately-congested
    networks where the congested router used active queue management
    (such as Random Early Detection [FJ93]).

    Some TCP connections will receive better performance with the higher
    initial window even if the burstiness of the initial window results
    in premature segment drops.  This will be true if (1) the TCP
    connection recovers from the segment drop without a retransmit
    timeout, and (2) the TCP connection is ultimately limited to a small
    congestion window by either network congestion or by the receiver's
    advertised window.

5.  Disadvantages of Larger Initial Windows for the Network

    We consider two separate potential dangers for the network.  The
    first danger would be a scenario where a large number of segments on

Allman                                                          [Page 3]

                                                              March 1998

    congested links were duplicate or unnecessarily-retransmitted
    segments that had already been received at the receiver.  The second
    danger would be a scenario where a large number of segments on
    congested links were segments that would be dropped later in the
    network before reaching their final destination.

    Unnecessarily-retransmitted segments:

        As described in the previous section, the larger initial window
        could occasionally result in a segment dropped from the initial
        window, when that segment might not have been dropped if the
        sender had slow-started from an initial window of one segment.
        However, Appendix A shows that even in this case, the larger
        initial window would not result in a large number of
        unnecessarily-retransmitted segments.

    Segments dropped later in the network:

        How much would the larger initial window for TCP increase the
        number of segments on congested links that would be dropped
        before reaching their final destination?  This is a problem that
        can only occur for connections with multiple congested links,
        where some segments might use scarce bandwidth on the first
        congested link along the path, only to be dropped later along
        the path.

        First, many of the TCP connections will have only one congested
        link along the path.  Segments dropped from these connections do
        not ``waste'' scarce bandwidth, and do not contribute to
        congestion collapse.

        However, some network paths will have multiple congested links,
        and segments dropped from the initial window could use scarce
        bandwidth along the earlier congested links before being dropped
        on subsequent congested links.  To the extent that the drop rate
        is independent of the initial window used by TCP segments, the
        problem of congested links carrying segments that will be
        dropped before reaching their destination will be similar for
        TCP connections that start by sending four segments or one
        segment.

        For a network with a high segment drop rate, increasing the
        initial TCP congestion window could increase the segment drop
        rate even further.  This is in part because routers with drop
        tail queue management have difficulties with bursty traffic in
        times of congestion.  However, this should be a second order
        effect.  Given uncorrelated arrivals for TCP connections, the
        larger initial TCP congestion window should generally not
        significantly increase the segment drop rate.

6.  Network Changes
    
    There are other changes in the network that make a larger initial
    window less of a problem.  These include the increasing deployment

Allman                                                          [Page 4]

                                                              March 1998

    of higher-speed links where 4K bytes is a rather small quantity of
    data and the deployment of queue management mechanisms such as RED
    that are more tolerant of transient traffic bursts.  The current
    dangers of congestion collapse most likely now come not from a 4K
    initial burst from TCP connections, but from the increased
    deployment of UDP connections without end-to-end congestion control.

7.  Concerns

    All the experiments (see section 8) with larger initial windows have
    tested how the larger window affects the TCP connection that uses
    the larger window.  No one has thoroughly studied the impact of the
    larger window on other TCP connections.  In particular, no one has a
    thorough set of answers about what happens when a TCP bursts a
    larger initial window into or across a path already being shared by
    a set of established TCP connections.

    Part of the reason for this omission is the assumption that the
    effect is small.  For example, in much of the Internet bursts of 2
    and 3 segments are common and bursts of 4 and 5 segments are not
    rare.  A delayed ACK (covering two previously unacknowledged
    segments) received during congestion avoidance causes the window to
    slide and 2 segments to be sent.  The same delayed ACK received
    during slow start causes the window to slide by 2 segments and then
    be incremented by 1 segment, leading to a 3 segment burst.  Assuming
    delayed ACKs, a single dropped ACK causes the subsequent ACK to
    cover 4 previously unacknowledged segments.  During congestion
    avoidance this leads to a 4 segment burst and during slow start a 5
    segment burst is generated.

    However, there are some common scenarios where a larger initial
    window might have an effect.  One example is low speed tail circuits
    with routers with small buffers.  For instance, imagine a dialup
    link connecting routers each of which have a handful of buffers.
    Further imagine the link is already being shared by a few TCP
    connections.  Then a new connection launches a large initial window,
    causing losses.  How long will it be before the connections resume
    sharing the link fairly?  Are there any signs of a capture effect,
    in which the new TCP gets a large fraction of the bandwidth?  (A
    capture effect could ensure that, say, an SMTP server got more
    bandwidth than a long running FTP).

    Another scenario of concern is heavily loaded links.  For instance,
    a couple of years ago, one of the trans-Atlantic links was so
    heavily loaded that the correct congestion window size for a
    connection was about one segment.  In this environment, new
    connections using larger initial windows would be starting with
    windows that were four times too big.  What would the effects be?
    Do connections thrash?

    
Allman                                                          [Page 5]

                                                              March 1998

8.  Experimental Results    
    
8.1 Studies of TCP Connections using Larger Initial Windows

    A number of studies have been done using larger initial windows.
    The first study considers the effects on the global Internet, as
    well as on slow dialup modem links [All97a].  These test results
    show that for 16 KB transfers to 100 Internet hosts, 4 segment
    initial windows resulted in an increase in the drop rate of 0.04
    segments/transfer.  While the drop rate increased slightly, the
    transfer time was reduced by roughly 25% for transfers using a 4
    segment (512 byte MSS) initial window when compared to an initial
    window of 1 segment.  Tests over a 28.8 bps dialup channel showed no
    increase in the drop rate and a transfer time decrease of roughly
    10% over standard TCP when using a 4 segment initial window.

    In another study, larger initial windows have been shown to improve
    performance over satellite channels [All97b].  In this study, an
    initial window of 4 segments (512 byte MSS) resulted in throughput
    improvements of up to 30% (depending upon transfer size).

    Next, a study involving simulations of a large number of HTTP
    transactions over hybrid fiber coax (HFC) indicates that the use of
    larger initial windows decreases the time required to load WWW pages
    [Nic97].  [HAGT98] also shows that the use of larger initial windows
    results in a decrease in transfer time in HTTP tests over the ACTS
    satellite system.
    
    A study investigated the effects of using a larger initial window on
    a host connected by a slow modem link and a router with a 3 packet
    buffer [SP97].  This study found that in this environment, larger
    initial windows slightly improved performance.

8.2 Studies of Networks using Larger Initial Windows

    A simulation study of how the use of a larger initial window impacts
    competing network traffic is outlined in [PN98].  In this
    investigation, a number of HTTP and FTP flows were sharing a
    congested gateway (the exact number of flows was varied in this
    study).  The study showed improvement in HTTP transfer times on the
    order of 30% in many scenarios.  In addition, a larger initial
    window slightly increased the segment drop rate (only one scenario
    increased the drop rate more than 1% above the loss rate experienced
    when using an initial window of 1 segment).

    Morris [Mor97] investigated larger initial windows in a very congested
    network.  The loss rate in networks where all TCP connections use an
    initial window of 4 segments is shown to be 1-2% greater than in a
    network where all connections use an initial window of 1 segment.
    In addition, in networks where connections used an initial window of
    4 segments, roughly 5-10% more time was spent waiting for the
    retransmit timer (RTO) to expire to resend a segment than was spent
    when using an initial window of 1 segment.  The time spent waiting
    for the RTO timer to expire represents idle time when no useful work

Allman                                                          [Page 6]

                                                              March 1998

    was being accomplished.  These results show that in a very congested
    environment, where each connection's share of the bottleneck
    bandwidth is close to 1 segment, using a larger initial window
    degrades performance.

9.  Conclusion

    This draft suggests a small change to TCP that may be beneficial to
    short lived TCP connections and those over links with long RTTs
    (saving several RTTs during the initial slow-start phase).

10. Acknowledgments

    We would like to acknowledge Tim Shepard and the members of the
    End-to-End-Interest Mailing List for continuing discussions of these
    issues.

References

    [All97a] Mark Allman.  An Evaluation of TCP with Larger Initial
        Windows.  40th IETF Meeting -- TCP Implementations WG.
        December, 1997.  Washington, DC.

    [All97b] Mark Allman.  Improving TCP Performance Over Satellite
        Channels.  Master's thesis, Ohio University, June 1997.

    [BLFN96] Tim Berners-Lee, R. Fielding, and H. Nielsen.  Hypertext
        Transfer Protocol -- HTTP/1.0, May 1996.  RFC 1945.

    [Bra89] Robert Braden.  Requirements for Internet Hosts --
        Communication Layers, October 1989.  RFC 1122.

    [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of
        Tahoe, Reno, and SACK TCP.  Computer Communication Review,
        26(3), July 1996.

    [FF98] Sally Floyd, Kevin Fall.  Promoting the Use of End-to-End
        Congestion Control in the Internet.  Submitted to IEEE
        Transactions on Networking.

    [FJGFBL97] R. Fielding, Jeffrey C. Mogul, Jim Gettys, H. Frystyk,
        and Tim Berners-Lee.  Hypertext Transfer Protocol -- HTTP/1.1,
        January 1997.  RFC 2068.

    [FJ93] Floyd, S., and Jacobson, V., Random Early Detection gateways
        for Congestion Avoidance. IEEE/ACM Transactions on Networking,
        V.1 N.4, August 1993, p. 397-413.
    
    [Flo94] Floyd, S., TCP and Explicit Congestion Notification.
        Computer Communication Review, 24(5):10-23, October 1994.

    [Flo96] Floyd, S., Issues of TCP with SACK. Technical report, January
        1996.  Available from http://www-nrg.ee.lbl.gov/floyd/.


Allman                                                          [Page 7]

                                                              March 1998

    [HAGT98] Hans Kruse, Mark Allman, Jim Griner, Diepchi Tran.  HTTP
        Page Transfer Rates Over Geo-Stationary Satellite Links.  March
        1998.  Proceedings of the Sixth International Conference on
        Telecommunication Systems.  To Appear.

    [MD90] Jeffrey C. Mogul and Steve Deering.  Path MTU Discovery,
        November 1990.  RFC 1191.

    [MMFR96] Matt Mathis, Jamshid Mahdavi, Sally Floyd and Allyn
        Romanow.  TCP Selective Acknowledgment Options, October 1996.
        RFC 2018.

    [Mor97] Robert Morris.  Private communication.

    [Nic97] Kathleen Nichols.  Improving Network Simulation with
        Feedback.  Com21, Inc. Technical Report.  Available from
        http://www.com21.com/pages/papers/068.pdf.

    [PN98] Poduri, K., and Nichols, K., Simulation Studies of Increased
        Initial TCP Window Size, February 1998.  Internet-Draft
        draft-ietf-tcpimpl-poduri-00.txt (work in progress).

    [Pos82] Jon Postel.  Simple Mail Transfer Protocol, August 1982.
        RFC 821.

    [RF97] Ramakrishnan, K.K., and Floyd, S., A Proposal to Add Explicit
        Congestion Notification (ECN) to IPv6 and to TCP. Internet-Draft
        draft-kksjf-ecn-00.txt (work in progress). November 1997.

    [SP97] Tim Shepard and Craig Partridge.  When TCP Starts Up With
        Four Packets Into Only Three Buffers, July 1997.  Internet-Draft
        draft-shepard-TCP-4-packets-3-buff-00.txt (work in progress).

Appendix A

    In the current environment (without Explicit Congestion Notification
    [Flo94] [RF97]), all TCPs use segment drops as indications from the
    network about the limits of available bandwidth.  The change to a
    larger initial window should not result in a large number of
    unnecessarily-retransmitted segments.

    If a segment is dropped from the initial window, there are three
    different ways for TCP to recover: (1) Slow-starting from a window
    of one segment, as is done after a retransmit timeout, or after Fast
    Retransmit in Tahoe TCP; (2) Fast Recovery without selective
    acknowledgments (SACK), as is done after three duplicate ACKs in
    Reno TCP; and (3) Fast Recovery with SACK, for TCP where both the
    sender and the receiver support the SACK option [MMFR96].  In all
    three cases, if a single segment is dropped from the initial window,
    there are no unnecessarily-retransmitted segments.  Note that for a
    TCP sending four 512-byte segments in the initial window, a single
    segment drop will not require a retransmit timeout, but can be
    recovered from using the Fast Retransmit algorithm.  In addition, a
    single segment dropped from an initial window of three segments may

Allman                                                          [Page 8]

                                                              March 1998

    be repaired using the fast retransmit algorithm, depending on which
    segment is dropped and whether or not delayed ACKs are used.  For
    example, dropping the first segment of a three segment initial
    window will always require waiting for a timeout.  However, dropping
    the third segment will always allow recovery via the fast retransmit
    algorithm.

    We now consider the case when multiple segments are dropped from the
    initial window.  Using the first recovery method, slow-starting from
    a window of one segment, the number of unnecessarily-retransmitted
    segments is limited [FF96].  In the second case of Fast Recovery
    without SACK, multiple segment drops from a window of data generally
    result in a retransmit timeout.  Again, the number of
    unnecessarily-retransmitted segments is small.  In the third case,
    of Fast Recovery with SACK, there can only be
    unnecessarily-retransmitted segments if a precise pattern of ACK
    segments are also lost [Flo96], or if segments are
    seriously-reordered in the network.  In any case, the number of
    unnecessarily-retransmitted segments due to a larger initial window
    should be small.

Author's Addresses

    Mark Allman 
    NASA Lewis Research Center/Sterling Software
    21000 Brookpark Road
    MS 54-2
    Cleveland, OH 44135
    mallman@lerc.nasa.gov
    http://gigahertz.lerc.nasa.gov/~mallman/

    Sally Floyd
    Lawrence Berkeley National Laboratory
    One Cyclotron Road
    Berkeley, CA 94720
    floyd@ee.lbl.gov

    Craig Partridge
    BBN Technologies
    10 Moulton Street
    Cambridge, MA 02138
    craig@bbn.com


Allman                                                          [Page 9]