INTERNET DRAFT                                         Yogesh Prem Swami
File: draft-swami-tcp-lmdr-00.txt                               Khiem Le
Expires: September 2003                            Nokia Research Center
                                                                  Dallas
                                                              March 2003


           Lightweight Mobility Detection and Response (LMDR)
                           Algorithm for TCP


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of [RFC2026].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Abstract

     TCP congestion control is based on the assumption that end-to-end
     path of a TCP connection does not change--or at best changes
     infrequently--once the connection is established. However, when a
     user moves from one subnet to another, this assumption does not
     hold. In these cases, relying on the rate of arrival of ACKs as the
     only criterion for congestion control can lead to congestion
     collapse if a (group of) receiver(s) can keep sending ACKs in a
     regular fashion even after subnet change. What's worse is that a
     TCP sender may be totally unaware of its peer's mobility to take
     any remedial action. In this document we describe a network layer
     independent mechanism by which a TCP host can propagate its subnet
     change information to its peer, based on which, the sender can
     appropriately reduce its data rate.


Expires: September 2003                                         [Page 1]

draft-swami-tcp-lmdr-00.txt                                   March 2003


1. Introduction

     TCP congestion control [RFC2581] is based on the assumption that
     end-to-end path of a TCP connection does not change--or at best
     changes infrequently--once the connection is established. Based on
     this assumption, TCP increases its data rate (probes the network)
     whenever it receives a positive feedback. However, unless the
     assumption of "constant path" for each packet is made, there would
     be no reason to increase the data rate based on ACKs received for
     previous data.

     When a TCP sender or receiver changes its point of attachment to
     the Internet (henceforth referred as "changes subnets"), the entire
     end-to-end path between the sender and receiver can change. In
     these cases, the rate at which ACKs are received only reflect the
     state of the old path, but not the new one. Therefore, relying on
     the rate of arrival of ACKs as the only criterion for congestion
     control can lead to congestion collapse in these cases. To
     summarize, if a TCP sender continues to maintain its congestion
     state after a subnet change, either

     a) the sender will add to severe congestion and force
        numerous packet loss on the new path, or

     b) it will spend a lot of time trying to reach a reasonable
        throughput on the new path. This will happen if the sender was
        doing congestion avoidance on the old path and the BDP on the
        new path is much higher than on the old path (such scenarios
        occur when users move from a cellular network to a wireless LAN
        network, for example).

     Regardless of the event, the final result of using the same
     congestion state on the two paths will almost always result in a
     loss of overall throughput.

     In [SL02], we used spurious timeouts as an implicit indication for
     subnet change. Although one of the largest sources of spurious
     timeouts are indeed subnet change, yet spurious timeouts alone are
     not a fool proof method to detect subnet change. In many cases
     [K03], depending upon the network architecture, it's possible that
     a subnet change does not trigger a spurious timeout at all
     (however, in cases where it does, the sender should use [SL02] in
     conjunction with the mechanism described here. Note that [K03]
     cannot eliminate the possibility of spurious timeouts due to subnet
     change in all cases.)

     We describe a network layer independent mechanism by which a TCP
     host can propagate its subnet change information to its peer. We


Expires: September 2003                                         [Page 2]

draft-swami-tcp-lmdr-00.txt                                   March 2003


     assume that a mobile host always knows its own subnet change (for
     example, by looking at its neighbor cache, destination cache,
     default router, or a combination of these [RFC2461]), but
     currently, it may not be able to inform this to its peer.

     Please note that some network layer mobility management techniques
     such Mobile-IPv6 [JPA03] with route optimization may be used to
     indirectly derive peer's mobility information (for example, a TCP
     host can look into its binding cache to derive its peer's mobility
     information), but these schemes do not work in case of other
     mobility management techniques such as Mobile-IPv6 with reverse
     tunneling, Mobile-IPv4 [RFC3344], or other types of networks such
     as traditional cellular networks. Once a TCP sender has mobility
     information about itself or its peer, it can use the congestion
     response described in section-5 to adjust its data rate.

     The rest of this document is organized as follows: Section-2
     defines the terminology used in this document. Section-3 describes
     the issue of congestion in more detail. Section-4 has the details
     of subnet change algorithm, and Section-5 contains the associated
     congestion response algorithm. Section-6 and Section-7 describe
     certain corner cases and security considerations respectively.

2. Terminology

     The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT,"
     "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and
     "silently ignore" in this document are to be interpreted as
     described in [RFC2119].

     Mobile Node (MN):
          A host (not a router) capable of changing its point of
          attachment to the Internet without breaking transport layer
          connectivity. Hosts that change their point of attachment to
          the Internet but use DHCP or other mechanism to get a new IP
          address are not considered mobile.

     Old Subnet:
          MN's point of attachment (subnet prefix) to the Internet prior
          to movement

     New Subnet:
          MN's point of attachment after movement.

     Stale ACK:
          ACKs in flight generated in Old Subnet.

     INIT_WINDOW:


Expires: September 2003                                         [Page 3]

draft-swami-tcp-lmdr-00.txt                                   March 2003


          The initial congestion window size at the start of connection
          as described in [RFC3390].

3. Congestion Issues with Subnet Change

     For concreteness, the description below assumes network mobility
     based on Mobile IP, but the same concepts are readily applicable to
     other types of networks.

     To illustrate the problem, consider Figure-1. At time T=0, the MN
     is reachable on Subnet-1 through AR-1 and has the care-of address
     <Subnet-1, MN>. While MN is "attached" to AR-1, packet exchange
     between TCP-Sender and <Subnet-1, MN> takes place using PATH-1.
     Let's assume that after some period of time, at T+1, MN moves
     (hands over) to Subnet-2 and is reachable through AR-2 with the
     care-of address <Subnet-2, MN>. While MN is attached to AR-2, all
     packets exchanged between TCP-Sender and <Subnet-2, MN> traverse
     though the Internet Cloud-2 (which may or may not overlap with
     Cloud-1) and use PATH-2.


                          <---------PATH-1---------->

                            /---------\   +---------+
                            |         |   |         | Subnet-1
                        +---+ Cloud-1 +---+  AR-1   +-->>>>>MN
                        |   |         |   |         |  (Time=T)
       +------------+   |   \----++---/   +---------+
       |            |   |        ||            |
       | TCP Sender +---+        ^V PATH-3    ^V^ PATH-4
       |            |   |        ||            |
       +------------+   |   /----++---\   +----+----+
                        |   |         |   |         | Subnet-2
                        +---+ Cloud-2 +---+  AR-2   +-->>>>>MN
                            |         |   |         |  (Time=T+1)
                            \---------/   +---------+

                           <--------PATH-2----------->


     During the transient period when MN moves from Subnet-1 to
     Subnet-2, AR-1 may (or may not) buffer and forward packets destined
     to and from <Subnet-2, MN> through PATH-3 or through PATH-4 [K03],
     or a combination of PATH-2 and PATH-4.

     We make the distinction between PATH-3 and PATH-4 to emphasize the
     fact that PATH-4 may belong to a well provisioned network that has
     dynamic equilibrium for mobile users. Such networks are designed to


Expires: September 2003                                         [Page 4]

draft-swami-tcp-lmdr-00.txt                                   March 2003


     accommodate very bursty traffic. PATH-3, on the other hand, may
     consist of arbitrary routers without proper provisioning.

     Let's assume that a TCP connection was progressing between MN and
     TCP Sender when the user moves from Subnet-1 to Subnet-2. We now
     analyze the problem of congestion on different paths shown above.

3.1 Congestion On PATH-1

     Congestion on PATH-1 is governed by basic slow-start and congestion
     avoidance mechanisms [RFC2581]. As long as MN is on Subnet-1,
     standard congestion control is sufficient. But once it moves from
     Subnet-1 to Subnet-2, two different events can take place:

     1. all packets destined to Subnet-1 are dropped by AR-1.
        In this case, after MN moves to Subnet-2, the TCP sender will
        timeout. After timeout, the TCP sender will start with a
        congestion window of one which will hopefully traverse the new
        path PATH-3. In this case there is no need for extra congestion
        control.

        The disadvantage, however, of dropping all packets destined to
        Subnet-1 are:

        a) The sender will wait for one complete RTO, before it can
           start loss recovery

        b) If MN moves faster than one subnet per RTO on an average,
           the TCP receiver will take a very long time to recover such
           packets (theoretically, it will never be able to recover, but
           in practice this is not true due to the randomness of
           motion).

        c) The sender will reduce its SS_THRESH to 1/2 packets in
           flight. Since there is no correlation between BDP and packet
           loss on PATH-1, the throughput of the connection will suffer
           if the SS_THRESH on new path is set to a very small value
           (for example, if the sender moves to the new path right after
           the connection setup, and the SS_THRESH gets set to 2*MSS.)

     2. all packets (or all packets arriving to AR-1 during some
        period of time) destined to <Subnet-1, MN> are forwarded to
        <Subnet-2, MN> ([K03] describes the details of how this can be
        done). In this case, AR-1 can forward packets to <Subnet-2, MN>
        using PATH-3 or PATH-4. We consider these two paths separately.


Expires: September 2003                                         [Page 5]

draft-swami-tcp-lmdr-00.txt                                   March 2003


3.2 Congestion On PATH-3

     If AR-1 starts forwarding packets to AR-2 using PATH-3, PATH-3 will
     experience a sudden burst of data. In addition, If multiple MNs
     move between AR-2 and AR-1, PATH-3 could get severely congested.
     But if sending packets on PATH-3 is bad for other connections,
     dropping them is bad for the connection that changed subnets
     (section-3.1).

3.3 Congestion On PATH-4

     In many cases, it's reasonable to assume that wireless service
     providers will have a well provisioned network that can accommodate
     highly bursty traffic. Such networks may have a dynamic equilibrium
     where the average transit traffic from AR-1 to AR-2 is the same as
     the transit traffic from AR-2 to AR-1. Such well provisioned paths
     are, however, not possible Internet-wide, since different mobile
     users will typically be connected to different TCP hosts.

3.4 Congestion On PATH-2

     Since the MN is able to receive packets even after moving away from
     AR-1, it will continue to generate ACKs in the orderly fashion.
     These ACKs will traverse PATH-3 or PATH-4  and finally reach the
     TCP sender. But the segments sent by TCP sender due to these ACKs
     will travel on PATH-2 (assuming the TCP sender has received the
     binding update to send data on new path). Unfortunately, the TCP
     sender has no congestion information about PATH-2; using the old
     congestion window may cause network congestion on PATH-2. This
     problem becomes worse as the number of mobile users or rate of
     subnet change increases in the system.

     To summarize, after a subnet change, if the old access router does
     not take part in tunneling packets to new subnet, there is no
     problem of congestion, but such a scheme is inefficient
     (section-3.1). On the other hand, if an old access router does take
     part in tunneling packets to new subnet, the new path may get
     heavily congested.

4. Subnet Change Detection

     Quite often, a TCP sender is not aware of its peer's subnet state
     (whether it's in the old subnet or in a new subnet) even though its
     peer almost always knows about its own subnet information. This
     happens, for example, if MN uses Mobile-IPv6 with reverse routing
     (i.e., the home network transparently tunnels all packets to the
     receiver), or Mobile-IPv4, or cellular network for mobility
     management. It's therefore important to have a subnet change


Expires: September 2003                                         [Page 6]

draft-swami-tcp-lmdr-00.txt                                   March 2003


     detection mechanism at the transport layer that can propagate this
     information between peers. This section describes such a subnet
     change detection scheme.

     Subnet change detection in itself is a two step process. First, a
     mobile terminal needs to know it has moved from one subnet to
     another; second it needs to propagate this information to its peer.
     Detecting when a mobile terminal has changed its subnet is a
     neighbor discovery [RFC2461] problem and is beyond the scope of
     this document. In this document we assume that TCP hosts can
     determine their own subnet information with the assistance from
     lower layers.

     We now focus on how a mobile can propagate this information to its
     peer. To do so, we propose to use one bit--call it 'M-bit'--from
     "reserved bits" in the TCP header. This bit acts as a flag whose
     value remains unchanged as long as the mobile remains attached to
     the same subnet. Once the mobile moves to a new subnet, it flips
     (binary NOT) the bits and keeps the bit flipped as long as it
     remains in the new subnet. The peer host compares the value of 'M-
     bit' with the previously received values and uses any M-bit
     transition as an indication for peer's subnet change.

     Following are the details of subnet change detection algorithm:

     1. Each TCP implementation should keep three state
        variables--my_subnet_flag, rem_subnet_flag, and high_out_old--to
        facilitate mobility detection. In addition, a TCP host MAY also
        keep another state variable--prefix_now--to indicate the current
        subnet-prefix information. The first two flags (my_subnet_flag,
        rem_subnet_flag) hold the mobility state information about the
        local TCP and remote TCP hosts respectively. 'high_out_old' is
        the highest sequence number of packet-in-flight when a TCP
        receiver detects that its peer has changed subnet. This state
        information is needed for congestion response.

     2. At connection set up, both the client and server willing to
        have mobility detection should set the M=1 in the SYN packets
        sent by TCP client and server. If either (or both) of the SYN
        packets has M=0, then the TCP sender should stop processing
        mobility detection and response scheme. In these cases a Mobile
        Host should let the sender to timeout after subnet change.

        Once both the entities know that the sender and receiver have
        mobility detection capabilities, the TCP sender and receiver
        should initialize

                    my_subnet_flag =1; remote_subnet_flag=1;


Expires: September 2003                                         [Page 7]

draft-swami-tcp-lmdr-00.txt                                   March 2003


     3. For each packet sent, each the TCP host should determine
        if it has moved to a new subnet. If either the sender or the
        receiver determines that it has moved, it should update the
        value of my_subnet_flag as follows:

                      my_subnet_flag =  ~(my_subnet_flag)

        where '~' is the boolean operation NOT.

     4. Before sending any data or ACK packet, the TCP sender should
        set the value of M-bit in the TCP header as:

                                M=my_subnet_flag

     5. When the peer TCP receives a valid TCP packet, it should
        compare the value of 'M-bit' with the value of
        'rem_subnet_flag.' If the two values match, TCP should proceed
        as usual. If the two flags differ, then the TCP sender SHOULD
        update the variables as follows:

                  rem_subnet_flag=M-bit of the present packet.

                high_out_old = Sequence Number of the Last Byte
                          in the retransmission queue.

     The peer TCP uses 'high_out_old' so that it does not base the
     congestion control decisions on stale ACKs.

     After making these changes, the TCP host SHOULD follow the
     congestion response algorithm as described in section-5.

NOTE: In certain network architectures it's possible that a mobile
   host (and the associated link technology) has information on the
   congestion of the new path. In these cases, if the congestion on the
   new path is low, one MAY choose not to indicate the mobility
   information (i.e., flip the 'M-bit') to the sender since there is no
   need to reduce the data rate. However, the mobility information MUST
   be indicated if no such information is available.

     Before moving further, we would like to point out the pros and cons
     of using a bit from the reserved field than defining a TCP potoin.
     We await feedback from the working group on this issue to decide
     whether a TCP option will be more desirable.

     Advantages:

     1. Since the number of Mobile terminals are expected to eventually
        exceed the number of stationary terminals, mobility deserves to


Expires: September 2003                                         [Page 8]

draft-swami-tcp-lmdr-00.txt                                   March 2003


        be an integral part of the protocol and not an add-on.

     2. A subnet change option requires capability negotiation feature
        at the start of the connection. Since there isn't enough room in
        the TCP options field, very soon it might not be possible to
        carry all option negotiations in the TCP SYN packets.

     Disadvantages:

     1. Since M-bit is part of reserved bit, a firewall [RFC3360] may
        drop the SYN packet itself. Packets with TCP option, on the
        other hand, have a better chance of traversing a firewall. We
        however believe that protocols should not be designed solely on
        the basis of current firewall designs, as firewalls can evolve
        in future. In addition, there is no standard way to determine
        what a firewall will and will not drop. We therefore believe
        that firewall vendors should accommodate protocol changes rather
        than vice-versa.

5. Congestion Response after Subnet Change

     The goal of congestion response after subnet change is to minimize
     congestion on PATH-2. In principle, congestion response for PATH-2
     has the same congestion control issues as with initiating a new
     connection--the sender should have no more than INIT_WINDOW worth
     of data outstanding on the *new path* and the SS_THRESH should be
     set to a large value. What makes the problem complex is the fact
     that unlike new connections, connections after subnet change have
     non-zero packets in flight. ***The congestion response after subnet
     change MUST therefore ignore the stale-ACKs and only use the ACKs
     generated in the new subnet to base its congestion control
     decisions.*** Unfortunately, the cumulative ACK property of TCP
     does not allow an easy way to ignore stale-ACKs. In this document
     we describe the congestion response in the presence of SACK option
     [RFC2018] only.

NOTE: We will describe the congestion response for a more general,
   or in the presence of other options, in the next update.

     With SACK option the congestion response waits for the SACK/ACK of
     new data sent in the new subnet, before growing its window.
     Following are the details of the algorithm:

     1. Set the congestion window as

                           cwnd=cwnd+INIT_WINDOW;

     2. Send INIT_WINDOW worth of data on the new path and


Expires: September 2003                                         [Page 9]

draft-swami-tcp-lmdr-00.txt                                   March 2003


        restart RTO timer as if this were a new connection [RFC2018].

     3. For each subsequent ACK received, follow
        mobile_SACK_cong_resp()

             mobile_SACK_cong_resp(tcp_packet ack_pkt){

                  IF ( ( ack_packet contains an ACK >
                                      high_out_old) OR
                     ( ack_packet contains a SACK > high_out_old)){

                       cwnd=INIT_WINDOW + 2;
                       SS_THRESH =INFINITE;

                       if( ack_packet contained a SACK >
                                      high_out_old){

                            Mark packets less than
                            high_out_old without a
                            SACK flag as lost;

                            Update packets in flight
                            assuming all unsacked packets
                            were lost;

                            Do loss recovery as described in
                            [BAFW02];

                       } else {

                            send new data as appropriate;

                       }

                       Follow [RFC2988] for timer calculation as if
                       this were a new connection;
                  }
                  ELSE {
                       cwnd = 0; /* Don't send any new data */

                       If ACK contains a SACK block, mark the
                       packet as sacked;

                       DO NOT restart the RTO timer even for
                       pure ACKs;
             }

     Please note that the above algorithm waits for an ACK or SACK block


Expires: September 2003                                        [Page 10]

draft-swami-tcp-lmdr-00.txt                                   March 2003


     that must have traversed the new path. In addition, the timer
     values are initialized as if this were a new connection. The timer
     values are not reset for stale ACKs since they don't provide any
     new congestion information (data flow rate) about the new path.

6. Anomalies

6.1 Race Conditions

     The congestion response algorithm described above works fine as
     long as the TCP sender receives the flipped M-bit before the new
     path is established. But if the flipped M-bit is received much
     later, the TCP sender would have already injected some data on the
     new path. An implementation MUST take proper precaution to send the
     M-bit before the new path is established (for example, by sending
     the flipped M-bit in parallel with the binding update procedure)

6.2 Rapid Subnet Hopping

     Consider the case when a mobile node moves from subnet-1 to
     subnet-2, to subnet-3 in a very short period of time. If all the
     ACKs generated in subnet-2 are lost, it's possible that the sender
     will miss the subnet change indication. We believe that such events
     are rare and we do not attempt to solve it.

7. Security Considerations

     Since M-bit is valid only for an acceptable ACK [RFC793], it's
     immune to passive attacks as long as the congestion window is not
     of the order of 2^32 bytes. However, M-bit is not safe against
     active DoS attacks (present TCP is not safe either). We will
     describe a security mechanism (a TCP option) to protect against
     active attacks if there is a requirement from the working group.


Expires: September 2003                                        [Page 11]

draft-swami-tcp-lmdr-00.txt                                   March 2003


8. REFERENCES

     [RFC2581]  M. Allman, V. Paxson, W. Stevens, "TCP Congestion
                Control," Apr 1999.

     [SL02]     Y. Swami, K. Le, "DCLOR: Decorrelated Loss Recovery
                using SACK option for spurious timeouts," Internet
                draft; work in progress, draft-swami-tsvwg-tcp-
                dclor-00.txt, Nov 2002.

     [K03]      R. Koodli, "Fast Handover for Mobile IPv6," Internet
                draft; work in progress, draft-ietf-mobileip-fast-
                mipv6-06.txt, Mar 2003.

     [RFC2461]  T. Narten, E. Normark., W, Simpson, " Neighbor Discovery
                for IP Version 6 (IPv6)," Dec 1998.

     [JPA03]    D. Johnson, C. Perkins, J. Arkko, "Mobility Support in
                IPv6," Internet Draft; Work In Progress, draft-ietf-
                mobileip-ipv6-21.txt, Feb 2003.

     [RFC3344]  C. Perkins, "IP Mobility Support for IPv4," Aug 2002.

     [RFC3390]  M. Allman, S. Floyd, C. Partridge, "Increasing TCP's
                Initial Window," Oct 2002.

     [RFC3360]  S. Floyd, "Inappropriate TCP Resets Considered Harmful,"
                Aug 2002.

     [BAFW02]   E. Blanton, M. Allman, K. Fall, L. Wang, "A Conservative
                SACK-based Loss Recovery Algorithm for TCP," Internet
                draft; work in progress, draft-allman-tcp-sack-13.txt,
                Oct 2002.

     [RFC2018]  M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP
                Selective Acknowledgment Options," RFC 2018. Nov 2000.

     [RFC2988]  V. Paxson, M. Allman, "Computing TCP's Retransmission
                Timer," Nov 2000.

     [RFC793]   "Transmission Control Protocol," RFC-793, Sept 1981.

9. IPR Statement

     The IETF has been notified of intellectual property rights claimed
     in regard to some or all of the specification contained in this
     document. For more information consult the on-line list of claimed
     rights at http://www.ietf.org/ipr.


Expires: September 2003                                        [Page 12]

draft-swami-tcp-lmdr-00.txt                                   March 2003


Author's Address:

   Yogesh Prem Swami                   Khiem Le
   Nokia Research Center, Dallas       Nokia Research Center, Dallas
   6000 Connection Drive               6000 Connection Drive
   Irving, TX-75063, USA.              Irving, TX-75063. USA.

   E-Mail: yogesh.swami@nokia.com      E-Mail: khiem.le@nokia.com
   Ph    : +1 972 374 0669             Ph    : +1 972 894 4882


Expires: September 2003                                        [Page 13]