Network Working Group F. Templin Internet-Draft Nokia Expires: May 10, 2004 November 10, 2003 Dynamic MTU Determination for IPv6-in-IPv4 Tunnels draft-templin-tunnelmtu-02.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 10, 2004. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document specifies a means for IPv6-in-IPv4 tunnel endpoints to dynamically determine maximum transmission unit (MTU) values using IPv6 flow state, IPv6 neighbor discovery messages and a new IPv6 hop-by-hop option for MTU discovery. The mechanism provides a means for: 1) the decapsulator to inform the encapsulator that the scheme is implemented, 2) the decapsulator to inform the encapsulator of the probed and maximum MTU sizes for the path and, and 3) the encapsulator to recognize when probe packets have successfully traversed the tunnel. Templin Expires May 10, 2004 [Page 1] Internet-Draft Tunnel MTU November 2003 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Static MTU Determination . . . . . . . . . . . . . . . . . . . 4 4. Dynamic MTU Determination . . . . . . . . . . . . . . . . . . 5 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 6. Security considerations . . . . . . . . . . . . . . . . . . . 11 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 Normative References . . . . . . . . . . . . . . . . . . . . . 11 Informative References . . . . . . . . . . . . . . . . . . . . 12 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13 Intellectual Property and Copyright Statements . . . . . . . . 14 Templin Expires May 10, 2004 [Page 2] Internet-Draft Tunnel MTU November 2003 1. Introduction IPv6-in-IPv4 tunnel interfaces (referred to hereafter as "tunnel interfaces" or simply "tunnels") use an IPv4 [RFC0791] internetwork as the link layer for IPv6 [RFC2461]. The tunnel endpoints (i.e., the encapsulator and decapsulator) are IPv6 neighbors, but packets inside the tunnel may traverse multiple IPv4 forwarding hops. Packets that are too large to traverse the tunnel are discarded with an ICMPv6 "packet too big" message returned to the source, as for any IPv6 interface [RFC1981]. IPv4 path MTU discovery [RFC1191] is normally used to determine the MTU of an IPv4 path. However, IPv4 path MTU discovery uses ICMPv4 "fragmentation needed" messages which in some cases do not provide enough header bytes from the original IPv6 packet for stateless translation to ICMPv6 "packet too big" messages ([RFC1812], section 4.3.2.3). Additionally, ICMPv4 messages can be spoofed, filtered, or not sent at all by some forwarding nodes resulting in black holes that are difficult to diagnose [RFC2923]. When designing an alternate path MTU discovery scheme, it is important to note that for IPv6 the representation of a path consists of the 3-tuple of the Flow Label and the Source and Destination address fields in the IPv6 header [FLOWSPEC]. Packet classifiers use the flow label for path selection, thus any probing scheme to determine the MTU of a path must be performed on a per-flow basis, i.e., even if there are multiple flows for the same destination. Finally, it is observed that the problem of black hole detection when an IPv4 path changes to include a restricting link is difficult if not impossible to solve as a network layer problem. Past attempts to do so have included: o the current IPv4 Path MTU discovery scheme. (But, in addition to the operational issues decribed above, IPv4 path MTU discovery is inefficient and slow to converge.) o probing the path with large data messages. (But, probing requires maintaing a queue of packets for an indeterminant time in the decapsulator while waiting for the probe to complete.) o sensing fragmentation at the decapsulator. (But, this requires special instrumentation in the decapsulator - also, forwarding nodes that do not support IPv4 fragmentation may occur along some paths and middleboxes that drop IPv4 fragments are seen in operational deployments. The dominating issue with any approach implemented solely at the Templin Expires May 10, 2004 [Page 3] Internet-Draft Tunnel MTU November 2003 network layer is that it is not possible for the network layer to anticipate the packet transmission strategy of the application. Any attempt for the network layer to divine such information without the explicit guidance of the application amounts to nothing more than an educated guess that is likely to be wrong - a result predicted by the End-to-End principle. Thus, alternate mechanisms to support dynamic MTU determination for IPv6-in-IPv4 tunnel interfaces are required and specified in this and other documents. 2. Terminology The terminology of [RFC2460], [RFC2461], [FLOWSPEC], [PLPMTUD], and [PMTUOPT] applies to this document. The following terms used in those documents are also defined here for clarity: MTU: same definition as in [RFC1122], section 1.3.3): "the maximum transmission unit, i.e., the size of the largest packet that can be transmitted". LinkMTU: same definition as "link MTU" in [RFC2460][RFC2461], which is also the same defintion as "LinkMTU" in ([RFC2461], section 6.3.2). The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. 3. Static MTU Determination Tunnel interfaces that do not participate in the dynamic MTU determination scheme MUST set LinkMTU to at least 1280 bytes, i.e., the minimum IPv6 MTU ([RFC2460], section 5). If a larger MTU is used, the value chosen SHOULD be: 1. no larger than the smallest known IPv4 Effective MTU to Receive (EMTU_R) ([RFC1122], section 3.3.2) for all potential decapsulators 2. no larger than the smallest known IPv4 link in the network 3. small enough to allow room for nested link-layer encapsulations (e.g., VPN) that may be inserted by middleboxes. Tunnel interfaces that use only static MTU determination send IPv6 packets that are no larger than LinkMTU with the DF bit NOT set in the encapsulating IPv4 header. The IPv6 layer discards packets larger than LinkMTU and sends an ICMPv6 "packet too big" message to the Templin Expires May 10, 2004 [Page 4] Internet-Draft Tunnel MTU November 2003 sender, as for any IPv6 interface. Even so, black holes may still occur along some paths since IPv4 fragmentation is not universally supported by all forwarding nodes, and the minimum supported packet size for IPv4 is only 576 bytes. 4. Dynamic MTU Determination Tunnel interfaces that use this scheme implement IPv6 Neighbor Discovery [RFC2461] except as otherwise noted in this document or in documents describing specific tunneling mechanisms, e.g., [RFC3056], [MECH], [ISATAP], etc. They additionally implement the minimal network-layer mechanisms needed to support Packetization Layer Path MTU Discovery [PLPMTUD] and IPv6 hop-by-hop options for MTU discovery [PMTUOPT] as specified in the following sections: 4.1 Tunnel Interface MTU Tunnel interfaces that implement the dynamic scheme set LinkMTU to at least 1280 and no more than 65515 bytes (i.e., the maximum IPv4 packet size minus 20 bytes for IPv4 header encapsulation). Robust implementations will normally use the larger value. 4.2 Host Variables Tunnel interfaces use the Host Variables specified in ([RFC2461], section 6.3.2) and add the additional per-interface configuration variable: FlowKeepAliveTime The time in seconds to wait during which no packets with the (flow_label, src, dst)-tuple are sent on the tunnel interface before garbage-collecting a flow state entry. Default: 120 seconds ([FLOWSPEC], section 3). 4.3 Flow State The tunnel interface manages flow state entries that correspond to the (flow_label, src, dst)-tuple taken from an IPv6 packet when the flow label encodes a value other than 0. Unless one already exists, a new flow entry is created when an IPv6 packet is sent on the tunnel interface with a Request MAP hop-by-hop option having the highest-order two bits of the option type set to '01' ([PMTUOPT], section 3.1). The expectation is that this is a probe packet sent by an application that implements the Packetization Templin Expires May 10, 2004 [Page 5] Internet-Draft Tunnel MTU November 2003 Layer Path MTU Discovery algorithm ([PLPMTUD]) and that a Probe Reply indication MAY be forthcoming. The following variables are maintained in each flow state entry: LastPacketSentTime A timestamp indicating when the last packet was sent on this flow. Initial value: Current Time. ProbeMTU The MTU value being probed by the current flow packet. Initial value: the length in bytes of the IPv6 packet that created the flow state entry. MaxMTU The Maximum MTU value returned in a Router Advertisement message acknowledging a probe. Initial value: 1024 byte. AckMTU The MTU value returned in a Router Advertisement message acknowledging a probe. Initial value: 1024 bytes RouterID The global IPv6 address of the router that responds to MTU probes. Initial value: 0::/128. Existing flow state entries are garbage-collected after a FlowKeepAliveTime window transpires in which no packets for the flow are sent. 4.4 The Tunnel Interface is a Router It is observed that the tunnel interface processes packets and forwards them between the IPv6 network and the IPv6-in-IPv4 network. It meets the description of a router in the ordinary sense, in that it routes packets from the overlaid IPv6 interface to a selected physical interface that will carry the IPv6-in-IPv4 packet. Tunnel interfaces should thus always set the IsRouter bit when appropriate in IPv6 Neighbor Discovery exchanges. 4.5 Sending Packets Packets are sent on the tunnel interface exactly as for IPv6-in-IPv4 encapsulation as specified by the various tunneling mechanisms. When a packet is sent that contains a Request MAP hop-by-hop option ([PMTUOPT], section 3), the tunnel interface is not required to write Templin Expires May 10, 2004 [Page 6] Internet-Draft Tunnel MTU November 2003 a new value in the PMTU field since the bits that carry that field are not covered by authentication. (Since this value could thus be altered by malicious middleboxes on the IPv4 path, it is not used to convey MTU values over IPv6-in-IPv4 tunneled portions of the entire path.) If the Request MAP option has the high order two bits of the option type set to '01', a flow state entry is created (if needed) as described in Section 4.3, and the IPv6 length of the packet is written into ProbeMTU. Next, if the packet's length is no more than MaxMTU bytes, the packet is sent to the next hop encapsulated in an IPv4 header with the DF bit set. When no flow state exists for a paricular packet, the MTU to use is taken from a static assignment as for Section 3 and the decision to send/discard the packet is as for that specification. When flow state exists, ordinary IPv6 packets (and, packets with a Request MAP option with the high order bits in the option type set to '00'0 are sent to the next hop if their length is no more than AckMTU bytes in length. Otherwise, if their length is greater than AckMTU but no more than MaxMTU, and the packet contains a fragmentation header, the packet undergoes IPv6 fragmentation with a maximum fragment size of AckMTU bytes. Next, each fragment is encapsulated in an IPv4 header with the DF bit set and sent to the next hop. Packets that are too large to be sent over the tunnel by the above specifications are discarded, and an ICMPv6 "packet too big" message sent back to the source with the value in AckMTU written into the MTU field. 4.6 Processing Received Packets with Request MAP Options When the tunnel interface receives an IPv6 packet (i.e., after decapsulation according to the rules of the specific tunneling mechanism) and the packet includes a Request MAP hop-by-hop option, the decapsulator first verifies that it has a security association with the encapsulator that sent the packet (otherwise the packet is dropped.) Next, the decapsulator writes a new value into the option field as follows: If the packet is destined to the local host, the decapsulator writes the MTU value of the physical interface the packet was received on, minus 20 bytes for IPv4 encapsulation. Otherwise, the decapsulator determines the next-hop interface for forwarding the IPv6 packet and writes the minimum MTU of the interface that received the packet (minus 20 bytes)and the interface that will be used to forward the packet. The IPv6 packet is then handed to upper layers for processing. Templin Expires May 10, 2004 [Page 7] Internet-Draft Tunnel MTU November 2003 If the high-order two bits of the option were '01', the encapsulator additionally sends a Router Advertisement message back to the encapsulator formatted as follows: o the IPv6 source address is set to the link-local address of tunnel interface o the IPv6 destination address is set to the link-local source address of the encapsulator that sent the original packet. o an IP Authentication Header is included that corresponds to the security association between the encapsulator and decapsulator. o a prefix information option is included with the Router Address (R) bit set and a global unicast address of the node in the Prefix field [MIPV6]. o a new option called the "MTU Probe Reply" option is included, and formatted as specified in Section 4.8. o No other options should be included, since this Router Advertisement will NOT be processed by upper layers. o a Response MAP hop-by-hop option is included. No value is encoded in the option, since the value could be modified by malicious middle boxes on the IPv4 path back to the decapsulaton. 4.7 Processing Received Packets with Response MAP Options When the tunnel interface receives an IPv6 packet that includes a Response MAP hop-by-hop option, it checks to see if the packet contains a Router Advertisment message. If the packet does not contain a Router Advertisement, it is handed to upper layers as for normal packets. Otherwise, the decapsulator validates the message as specified in ([RFC2461], section 6.1.2). If the message is invalid, or if the message does not include an authentication header, the message is discarded. If the message is valid, the decapsulator uses the values in the MTU Probe Reply option (see: Section 4.8) to locate a flow state entry, i.e., using the (flow_label, src, dst)-tuple encoded in the option. If no entry is located, or if an entry is located and the AckMTU value in the option is not the same as the ProbeMTU value in the flow state entry, the message is discarded. Otherwise, the decapsulor updates the entry's MaxMTU and AckMTU values in the flow state entry using the corresponding values from the MTU Probe Reply option. Templin Expires May 10, 2004 [Page 8] Internet-Draft Tunnel MTU November 2003 If the message was not discarded during the above processing, the decapsulator next examines the global unicast address of the node in the prefix field. If an address other than 0::/128 was previously stored in the flow state entry, and the two addresses do not match, the new address is cached and movement detection is triggered, e.g., as for mobility detection, fast handovers, etc. (If the previous value was 0::/128, the new value is cached and no further actions are taken.) Next, the Router Advertisement message is discarded, i.e., it is NOT handed up to higher layers for further processing. 4.8 MTU Probe Reply Option The following new IPv6 Neighbor Discovery message option is specified to support the MTU probing specification in this document: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MaxMTU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | AckMTU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Flow Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | IPv6 Source Address | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | IPv6 Destination Address | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields: Type TBD Length 4 Reserved Reserved fields are unused. They MUST be initialized to zero by the sender and MUST be ignored by the receiver. MaxMTU 32-bit unsigned integer. The absolute maximum Templin Expires May 10, 2004 [Page 9] Internet-Draft Tunnel MTU November 2003 packet size that the sender can accept on this link. AckMTU The size of the probe packet that the Reply message acknowledges. Flow Label The 20-bit Flow Label field from the IPv6 header of the probe packet that the Reply message acknowledges. IPv6 Source Address The 128-bit IPv6 source address from the IPv6 header of the probe packet that the Reply message acknowledged. IPv6 Destination Address The 128-bit IPv6 destination address from the IPv6 header of the probe packet that the Reply message acknowledged. Description The MTU Probe Reply option is used in Router Advertisement messages to inform tunnel encapsulators that send path MTU probes of the absolute maximum MTU that the sender can accept on the link and the acknowledged MTU size of a received probe packet. This option MUST be silently ignored for other Neighbor Discovery messages. The value encoded in the MaxMTU option MAY be larger than the values encoded in MTU options ([RFC2461], section 4.6.4) and MAY even be larger than the MTU of the actual link. The MaxMTU value provides indication of the sender's available buffer space for receiving packets. By including this option, the sender certifies that it has sufficient buffer space to receive packets of at least MaxMTU bytes under periods of normal congestion. 4.9 Processing IPv4 Errors ICMPv4 "fragmentation needed" messages MAY be translated into IPv6 "packet too big" messages and sent to the source of the original packet if the ICMPv4 message contains enough header bytes from the original IPv6 packet. This MAY help the packetization layer reach Templin Expires May 10, 2004 [Page 10] Internet-Draft Tunnel MTU November 2003 convergence more quickly. 5. IANA Considerations The IANA is instructed to assign a new IPv6 Neighbor Discovery option type for the MTU Probe Reply option specified in Section 4.7. 6. Security considerations Security issues are the same as for IPv6 neighbor discovery; works-in-progress from the IETF SEcuring Neighbor Discovery (SEND) working group may provide solutions. The MTU probing process is protected by a chain-of-trust. The packetization layer in the node that produces the probe packets must have privileged root access and all IPv6 routers on the path to the destination must use SEND for securing Neighbor Discovery. 7. Acknowledgements Most of the ideas expressed in this document are not new and borrow to a certain extent from the TCP-IP and Path MTU Discovery mailing list discussions beginning as early as 1987. Other ideas in the draft may have borrowed to some extent from discussions on the IETF MTU Discovery WG mailing list from November 1989 - February 1995, on the IETF NGTRANS WG mailing list in August 2002 and on the IETF IPv6 WG mailing list in October 2003. The author would like to acknowledge certain individuals for helpful discussion on this subject, including Jari Arkko, Iljitsch van Beijnum, Jim Bound, Ralph Droms, Alain Durand, Tim Gleeson, Jun-ichiro itojun Hagino, Brian Haberman, Bob Hinden, Christian Huitema, Kevin Lahey, Hakgoo Lee, Matt Mathis, Jeff Mogul, Erik Nordmark, Soohong Daniel Park, Chirayu Patel, Michael Richardson, Pekka Savola, Hesham Soliman, Mark Smith, Dave Thaler, Michael Welzl, Lixia Zhang and the members of the Nokia NRC/COM Mountain View team. "...and I'm one step ahead of the shoe shine, Two steps away from the county line, Just trying to keep my customers satisfied, Satisfi-i-ied!" - Simon and Garfunkel Normative References [FLOWSPEC] Rajahalme, J., Conta, A., Carpenter, B. and S. Deering, "IPv6 Flow Label Specification", draft-ietf-ipv6-flow-label (work in progress), October Templin Expires May 10, 2004 [Page 11] Internet-Draft Tunnel MTU November 2003 2003. [MECH] Gilligan, R. and E. Nordmark, "Basic Transition Mechanisms for IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2 (work in progress), October 2003. [PLPMTUD] Mathis, M., Heffner, J. and K. Lahey, "Path MTU Discovery", draft-ietf-pmtud-method (work in progress), October 2003. [PMTUOPT] Park, S. and H. Lee, "The PMTU Discovery for IPv6 Using Hop-byHop Option Header", draft-park-pmtu-ipv6-option-header (work in progress), March 2003. [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995. [RFC1981] McCann, J., Deering, S. and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC2461] Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998. Informative References [ISATAP] Templin, F., Gleeson, T., Talwar, M. and D. Thaler, "Intra-Site Automatic Tunnel Addressing Protocol", draft-ietf-ngtrans-isatap (work in progress), October 2003. [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", RFC Templin Expires May 10, 2004 [Page 12] Internet-Draft Tunnel MTU November 2003 2923, September 2000. [RFC3056] Carpenter, B. and K. Moore, "Connection of IPv6 Domains via IPv4 Clouds", RFC 3056, February 2001. Author's Address Fred L. Templin Nokia 313 Fairchild Drive Mountain View, CA 94110 US Phone: +1 650 625 2331 EMail: ftemplin@iprg.nokia.com Templin Expires May 10, 2004 [Page 13] Internet-Draft Tunnel MTU November 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Templin Expires May 10, 2004 [Page 14] Internet-Draft Tunnel MTU November 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Templin Expires May 10, 2004 [Page 15]