INTERNET DRAFT Yogesh Prem Swami File: draft-swami-tcp-lmdr-00.txt Khiem Le Expires: September 2003 Nokia Research Center Dallas March 2003 Lightweight Mobility Detection and Response (LMDR) Algorithm for TCP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract TCP congestion control is based on the assumption that end-to-end path of a TCP connection does not change--or at best changes infrequently--once the connection is established. However, when a user moves from one subnet to another, this assumption does not hold. In these cases, relying on the rate of arrival of ACKs as the only criterion for congestion control can lead to congestion collapse if a (group of) receiver(s) can keep sending ACKs in a regular fashion even after subnet change. What's worse is that a TCP sender may be totally unaware of its peer's mobility to take any remedial action. In this document we describe a network layer independent mechanism by which a TCP host can propagate its subnet change information to its peer, based on which, the sender can appropriately reduce its data rate. Expires: September 2003 [Page 1] draft-swami-tcp-lmdr-00.txt March 2003 1. Introduction TCP congestion control [RFC2581] is based on the assumption that end-to-end path of a TCP connection does not change--or at best changes infrequently--once the connection is established. Based on this assumption, TCP increases its data rate (probes the network) whenever it receives a positive feedback. However, unless the assumption of "constant path" for each packet is made, there would be no reason to increase the data rate based on ACKs received for previous data. When a TCP sender or receiver changes its point of attachment to the Internet (henceforth referred as "changes subnets"), the entire end-to-end path between the sender and receiver can change. In these cases, the rate at which ACKs are received only reflect the state of the old path, but not the new one. Therefore, relying on the rate of arrival of ACKs as the only criterion for congestion control can lead to congestion collapse in these cases. To summarize, if a TCP sender continues to maintain its congestion state after a subnet change, either a) the sender will add to severe congestion and force numerous packet loss on the new path, or b) it will spend a lot of time trying to reach a reasonable throughput on the new path. This will happen if the sender was doing congestion avoidance on the old path and the BDP on the new path is much higher than on the old path (such scenarios occur when users move from a cellular network to a wireless LAN network, for example). Regardless of the event, the final result of using the same congestion state on the two paths will almost always result in a loss of overall throughput. In [SL02], we used spurious timeouts as an implicit indication for subnet change. Although one of the largest sources of spurious timeouts are indeed subnet change, yet spurious timeouts alone are not a fool proof method to detect subnet change. In many cases [K03], depending upon the network architecture, it's possible that a subnet change does not trigger a spurious timeout at all (however, in cases where it does, the sender should use [SL02] in conjunction with the mechanism described here. Note that [K03] cannot eliminate the possibility of spurious timeouts due to subnet change in all cases.) We describe a network layer independent mechanism by which a TCP host can propagate its subnet change information to its peer. We Expires: September 2003 [Page 2] draft-swami-tcp-lmdr-00.txt March 2003 assume that a mobile host always knows its own subnet change (for example, by looking at its neighbor cache, destination cache, default router, or a combination of these [RFC2461]), but currently, it may not be able to inform this to its peer. Please note that some network layer mobility management techniques such Mobile-IPv6 [JPA03] with route optimization may be used to indirectly derive peer's mobility information (for example, a TCP host can look into its binding cache to derive its peer's mobility information), but these schemes do not work in case of other mobility management techniques such as Mobile-IPv6 with reverse tunneling, Mobile-IPv4 [RFC3344], or other types of networks such as traditional cellular networks. Once a TCP sender has mobility information about itself or its peer, it can use the congestion response described in section-5 to adjust its data rate. The rest of this document is organized as follows: Section-2 defines the terminology used in this document. Section-3 describes the issue of congestion in more detail. Section-4 has the details of subnet change algorithm, and Section-5 contains the associated congestion response algorithm. Section-6 and Section-7 describe certain corner cases and security considerations respectively. 2. Terminology The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and "silently ignore" in this document are to be interpreted as described in [RFC2119]. Mobile Node (MN): A host (not a router) capable of changing its point of attachment to the Internet without breaking transport layer connectivity. Hosts that change their point of attachment to the Internet but use DHCP or other mechanism to get a new IP address are not considered mobile. Old Subnet: MN's point of attachment (subnet prefix) to the Internet prior to movement New Subnet: MN's point of attachment after movement. Stale ACK: ACKs in flight generated in Old Subnet. INIT_WINDOW: Expires: September 2003 [Page 3] draft-swami-tcp-lmdr-00.txt March 2003 The initial congestion window size at the start of connection as described in [RFC3390]. 3. Congestion Issues with Subnet Change For concreteness, the description below assumes network mobility based on Mobile IP, but the same concepts are readily applicable to other types of networks. To illustrate the problem, consider Figure-1. At time T=0, the MN is reachable on Subnet-1 through AR-1 and has the care-of address . While MN is "attached" to AR-1, packet exchange between TCP-Sender and takes place using PATH-1. Let's assume that after some period of time, at T+1, MN moves (hands over) to Subnet-2 and is reachable through AR-2 with the care-of address . While MN is attached to AR-2, all packets exchanged between TCP-Sender and traverse though the Internet Cloud-2 (which may or may not overlap with Cloud-1) and use PATH-2. <---------PATH-1----------> /---------\ +---------+ | | | | Subnet-1 +---+ Cloud-1 +---+ AR-1 +-->>>>>MN | | | | | (Time=T) +------------+ | \----++---/ +---------+ | | | || | | TCP Sender +---+ ^V PATH-3 ^V^ PATH-4 | | | || | +------------+ | /----++---\ +----+----+ | | | | | Subnet-2 +---+ Cloud-2 +---+ AR-2 +-->>>>>MN | | | | (Time=T+1) \---------/ +---------+ <--------PATH-2-----------> During the transient period when MN moves from Subnet-1 to Subnet-2, AR-1 may (or may not) buffer and forward packets destined to and from through PATH-3 or through PATH-4 [K03], or a combination of PATH-2 and PATH-4. We make the distinction between PATH-3 and PATH-4 to emphasize the fact that PATH-4 may belong to a well provisioned network that has dynamic equilibrium for mobile users. Such networks are designed to Expires: September 2003 [Page 4] draft-swami-tcp-lmdr-00.txt March 2003 accommodate very bursty traffic. PATH-3, on the other hand, may consist of arbitrary routers without proper provisioning. Let's assume that a TCP connection was progressing between MN and TCP Sender when the user moves from Subnet-1 to Subnet-2. We now analyze the problem of congestion on different paths shown above. 3.1 Congestion On PATH-1 Congestion on PATH-1 is governed by basic slow-start and congestion avoidance mechanisms [RFC2581]. As long as MN is on Subnet-1, standard congestion control is sufficient. But once it moves from Subnet-1 to Subnet-2, two different events can take place: 1. all packets destined to Subnet-1 are dropped by AR-1. In this case, after MN moves to Subnet-2, the TCP sender will timeout. After timeout, the TCP sender will start with a congestion window of one which will hopefully traverse the new path PATH-3. In this case there is no need for extra congestion control. The disadvantage, however, of dropping all packets destined to Subnet-1 are: a) The sender will wait for one complete RTO, before it can start loss recovery b) If MN moves faster than one subnet per RTO on an average, the TCP receiver will take a very long time to recover such packets (theoretically, it will never be able to recover, but in practice this is not true due to the randomness of motion). c) The sender will reduce its SS_THRESH to 1/2 packets in flight. Since there is no correlation between BDP and packet loss on PATH-1, the throughput of the connection will suffer if the SS_THRESH on new path is set to a very small value (for example, if the sender moves to the new path right after the connection setup, and the SS_THRESH gets set to 2*MSS.) 2. all packets (or all packets arriving to AR-1 during some period of time) destined to are forwarded to ([K03] describes the details of how this can be done). In this case, AR-1 can forward packets to using PATH-3 or PATH-4. We consider these two paths separately. Expires: September 2003 [Page 5] draft-swami-tcp-lmdr-00.txt March 2003 3.2 Congestion On PATH-3 If AR-1 starts forwarding packets to AR-2 using PATH-3, PATH-3 will experience a sudden burst of data. In addition, If multiple MNs move between AR-2 and AR-1, PATH-3 could get severely congested. But if sending packets on PATH-3 is bad for other connections, dropping them is bad for the connection that changed subnets (section-3.1). 3.3 Congestion On PATH-4 In many cases, it's reasonable to assume that wireless service providers will have a well provisioned network that can accommodate highly bursty traffic. Such networks may have a dynamic equilibrium where the average transit traffic from AR-1 to AR-2 is the same as the transit traffic from AR-2 to AR-1. Such well provisioned paths are, however, not possible Internet-wide, since different mobile users will typically be connected to different TCP hosts. 3.4 Congestion On PATH-2 Since the MN is able to receive packets even after moving away from AR-1, it will continue to generate ACKs in the orderly fashion. These ACKs will traverse PATH-3 or PATH-4 and finally reach the TCP sender. But the segments sent by TCP sender due to these ACKs will travel on PATH-2 (assuming the TCP sender has received the binding update to send data on new path). Unfortunately, the TCP sender has no congestion information about PATH-2; using the old congestion window may cause network congestion on PATH-2. This problem becomes worse as the number of mobile users or rate of subnet change increases in the system. To summarize, after a subnet change, if the old access router does not take part in tunneling packets to new subnet, there is no problem of congestion, but such a scheme is inefficient (section-3.1). On the other hand, if an old access router does take part in tunneling packets to new subnet, the new path may get heavily congested. 4. Subnet Change Detection Quite often, a TCP sender is not aware of its peer's subnet state (whether it's in the old subnet or in a new subnet) even though its peer almost always knows about its own subnet information. This happens, for example, if MN uses Mobile-IPv6 with reverse routing (i.e., the home network transparently tunnels all packets to the receiver), or Mobile-IPv4, or cellular network for mobility management. It's therefore important to have a subnet change Expires: September 2003 [Page 6] draft-swami-tcp-lmdr-00.txt March 2003 detection mechanism at the transport layer that can propagate this information between peers. This section describes such a subnet change detection scheme. Subnet change detection in itself is a two step process. First, a mobile terminal needs to know it has moved from one subnet to another; second it needs to propagate this information to its peer. Detecting when a mobile terminal has changed its subnet is a neighbor discovery [RFC2461] problem and is beyond the scope of this document. In this document we assume that TCP hosts can determine their own subnet information with the assistance from lower layers. We now focus on how a mobile can propagate this information to its peer. To do so, we propose to use one bit--call it 'M-bit'--from "reserved bits" in the TCP header. This bit acts as a flag whose value remains unchanged as long as the mobile remains attached to the same subnet. Once the mobile moves to a new subnet, it flips (binary NOT) the bits and keeps the bit flipped as long as it remains in the new subnet. The peer host compares the value of 'M- bit' with the previously received values and uses any M-bit transition as an indication for peer's subnet change. Following are the details of subnet change detection algorithm: 1. Each TCP implementation should keep three state variables--my_subnet_flag, rem_subnet_flag, and high_out_old--to facilitate mobility detection. In addition, a TCP host MAY also keep another state variable--prefix_now--to indicate the current subnet-prefix information. The first two flags (my_subnet_flag, rem_subnet_flag) hold the mobility state information about the local TCP and remote TCP hosts respectively. 'high_out_old' is the highest sequence number of packet-in-flight when a TCP receiver detects that its peer has changed subnet. This state information is needed for congestion response. 2. At connection set up, both the client and server willing to have mobility detection should set the M=1 in the SYN packets sent by TCP client and server. If either (or both) of the SYN packets has M=0, then the TCP sender should stop processing mobility detection and response scheme. In these cases a Mobile Host should let the sender to timeout after subnet change. Once both the entities know that the sender and receiver have mobility detection capabilities, the TCP sender and receiver should initialize my_subnet_flag =1; remote_subnet_flag=1; Expires: September 2003 [Page 7] draft-swami-tcp-lmdr-00.txt March 2003 3. For each packet sent, each the TCP host should determine if it has moved to a new subnet. If either the sender or the receiver determines that it has moved, it should update the value of my_subnet_flag as follows: my_subnet_flag = ~(my_subnet_flag) where '~' is the boolean operation NOT. 4. Before sending any data or ACK packet, the TCP sender should set the value of M-bit in the TCP header as: M=my_subnet_flag 5. When the peer TCP receives a valid TCP packet, it should compare the value of 'M-bit' with the value of 'rem_subnet_flag.' If the two values match, TCP should proceed as usual. If the two flags differ, then the TCP sender SHOULD update the variables as follows: rem_subnet_flag=M-bit of the present packet. high_out_old = Sequence Number of the Last Byte in the retransmission queue. The peer TCP uses 'high_out_old' so that it does not base the congestion control decisions on stale ACKs. After making these changes, the TCP host SHOULD follow the congestion response algorithm as described in section-5. NOTE: In certain network architectures it's possible that a mobile host (and the associated link technology) has information on the congestion of the new path. In these cases, if the congestion on the new path is low, one MAY choose not to indicate the mobility information (i.e., flip the 'M-bit') to the sender since there is no need to reduce the data rate. However, the mobility information MUST be indicated if no such information is available. Before moving further, we would like to point out the pros and cons of using a bit from the reserved field than defining a TCP potoin. We await feedback from the working group on this issue to decide whether a TCP option will be more desirable. Advantages: 1. Since the number of Mobile terminals are expected to eventually exceed the number of stationary terminals, mobility deserves to Expires: September 2003 [Page 8] draft-swami-tcp-lmdr-00.txt March 2003 be an integral part of the protocol and not an add-on. 2. A subnet change option requires capability negotiation feature at the start of the connection. Since there isn't enough room in the TCP options field, very soon it might not be possible to carry all option negotiations in the TCP SYN packets. Disadvantages: 1. Since M-bit is part of reserved bit, a firewall [RFC3360] may drop the SYN packet itself. Packets with TCP option, on the other hand, have a better chance of traversing a firewall. We however believe that protocols should not be designed solely on the basis of current firewall designs, as firewalls can evolve in future. In addition, there is no standard way to determine what a firewall will and will not drop. We therefore believe that firewall vendors should accommodate protocol changes rather than vice-versa. 5. Congestion Response after Subnet Change The goal of congestion response after subnet change is to minimize congestion on PATH-2. In principle, congestion response for PATH-2 has the same congestion control issues as with initiating a new connection--the sender should have no more than INIT_WINDOW worth of data outstanding on the *new path* and the SS_THRESH should be set to a large value. What makes the problem complex is the fact that unlike new connections, connections after subnet change have non-zero packets in flight. ***The congestion response after subnet change MUST therefore ignore the stale-ACKs and only use the ACKs generated in the new subnet to base its congestion control decisions.*** Unfortunately, the cumulative ACK property of TCP does not allow an easy way to ignore stale-ACKs. In this document we describe the congestion response in the presence of SACK option [RFC2018] only. NOTE: We will describe the congestion response for a more general, or in the presence of other options, in the next update. With SACK option the congestion response waits for the SACK/ACK of new data sent in the new subnet, before growing its window. Following are the details of the algorithm: 1. Set the congestion window as cwnd=cwnd+INIT_WINDOW; 2. Send INIT_WINDOW worth of data on the new path and Expires: September 2003 [Page 9] draft-swami-tcp-lmdr-00.txt March 2003 restart RTO timer as if this were a new connection [RFC2018]. 3. For each subsequent ACK received, follow mobile_SACK_cong_resp() mobile_SACK_cong_resp(tcp_packet ack_pkt){ IF ( ( ack_packet contains an ACK > high_out_old) OR ( ack_packet contains a SACK > high_out_old)){ cwnd=INIT_WINDOW + 2; SS_THRESH =INFINITE; if( ack_packet contained a SACK > high_out_old){ Mark packets less than high_out_old without a SACK flag as lost; Update packets in flight assuming all unsacked packets were lost; Do loss recovery as described in [BAFW02]; } else { send new data as appropriate; } Follow [RFC2988] for timer calculation as if this were a new connection; } ELSE { cwnd = 0; /* Don't send any new data */ If ACK contains a SACK block, mark the packet as sacked; DO NOT restart the RTO timer even for pure ACKs; } Please note that the above algorithm waits for an ACK or SACK block Expires: September 2003 [Page 10] draft-swami-tcp-lmdr-00.txt March 2003 that must have traversed the new path. In addition, the timer values are initialized as if this were a new connection. The timer values are not reset for stale ACKs since they don't provide any new congestion information (data flow rate) about the new path. 6. Anomalies 6.1 Race Conditions The congestion response algorithm described above works fine as long as the TCP sender receives the flipped M-bit before the new path is established. But if the flipped M-bit is received much later, the TCP sender would have already injected some data on the new path. An implementation MUST take proper precaution to send the M-bit before the new path is established (for example, by sending the flipped M-bit in parallel with the binding update procedure) 6.2 Rapid Subnet Hopping Consider the case when a mobile node moves from subnet-1 to subnet-2, to subnet-3 in a very short period of time. If all the ACKs generated in subnet-2 are lost, it's possible that the sender will miss the subnet change indication. We believe that such events are rare and we do not attempt to solve it. 7. Security Considerations Since M-bit is valid only for an acceptable ACK [RFC793], it's immune to passive attacks as long as the congestion window is not of the order of 2^32 bytes. However, M-bit is not safe against active DoS attacks (present TCP is not safe either). We will describe a security mechanism (a TCP option) to protect against active attacks if there is a requirement from the working group. Expires: September 2003 [Page 11] draft-swami-tcp-lmdr-00.txt March 2003 8. REFERENCES [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control," Apr 1999. [SL02] Y. Swami, K. Le, "DCLOR: Decorrelated Loss Recovery using SACK option for spurious timeouts," Internet draft; work in progress, draft-swami-tsvwg-tcp- dclor-00.txt, Nov 2002. [K03] R. Koodli, "Fast Handover for Mobile IPv6," Internet draft; work in progress, draft-ietf-mobileip-fast- mipv6-06.txt, Mar 2003. [RFC2461] T. Narten, E. Normark., W, Simpson, " Neighbor Discovery for IP Version 6 (IPv6)," Dec 1998. [JPA03] D. Johnson, C. Perkins, J. Arkko, "Mobility Support in IPv6," Internet Draft; Work In Progress, draft-ietf- mobileip-ipv6-21.txt, Feb 2003. [RFC3344] C. Perkins, "IP Mobility Support for IPv4," Aug 2002. [RFC3390] M. Allman, S. Floyd, C. Partridge, "Increasing TCP's Initial Window," Oct 2002. [RFC3360] S. Floyd, "Inappropriate TCP Resets Considered Harmful," Aug 2002. [BAFW02] E. Blanton, M. Allman, K. Fall, L. Wang, "A Conservative SACK-based Loss Recovery Algorithm for TCP," Internet draft; work in progress, draft-allman-tcp-sack-13.txt, Oct 2002. [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective Acknowledgment Options," RFC 2018. Nov 2000. [RFC2988] V. Paxson, M. Allman, "Computing TCP's Retransmission Timer," Nov 2000. [RFC793] "Transmission Control Protocol," RFC-793, Sept 1981. 9. IPR Statement The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the on-line list of claimed rights at http://www.ietf.org/ipr. Expires: September 2003 [Page 12] draft-swami-tcp-lmdr-00.txt March 2003 Author's Address: Yogesh Prem Swami Khiem Le Nokia Research Center, Dallas Nokia Research Center, Dallas 6000 Connection Drive 6000 Connection Drive Irving, TX-75063, USA. Irving, TX-75063. USA. E-Mail: yogesh.swami@nokia.com E-Mail: khiem.le@nokia.com Ph : +1 972 374 0669 Ph : +1 972 894 4882 Expires: September 2003 [Page 13]