Network Working Group F. Templin Internet-Draft Nokia Expires: April 14, 2004 October 15, 2003 Path MTU Support for IPv6-in-IPv4 Tunnels draft-templin-tunnelmtu-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 14, 2004. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document specifies a means for IPv6-in-IPv4 tunnels to participate in IPv6 path MTU discovery. Also specified is a means for the tunnel decapsulator to inform the encapsulator of appropriate per-neighbor MTU values. Templin Expires April 14, 2004 [Page 1] Internet-Draft Tunnel MTU October 2003 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Static MTU Determination . . . . . . . . . . . . . . . . . . . . 3 3. Dynamic MTU Determination . . . . . . . . . . . . . . . . . . . 3 4. Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . 6 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 6. Security considerations . . . . . . . . . . . . . . . . . . . . 7 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 7 Normative References . . . . . . . . . . . . . . . . . . . . . . 7 Informative References . . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 Intellectual Property and Copyright Statements . . . . . . . . . 9 Templin Expires April 14, 2004 [Page 2] Internet-Draft Tunnel MTU October 2003 1. Introduction IPv6-in-IPv4 tunnels use IPv4 as a link layer for IPv6. The tunnel encapsulator and decapsulator are IPv6 neighbors, but packets inside the tunnel may traverse multiple IPv4 forwarding hops. Thus, the MTU of the IPv4 path within the tunnel determines the IPv6 MTU for the tunnel. Packets that are too large to traverse the tunnel are discarded with an ICMPv6 "packet too big" message returned to the source, as for any IPv6 interface [RFC1981]. But, using only IPv4 path MTU discovery [RFC1191] to determine the IPv6 MTU of the tunnel can lead to black holes that are difficult to diagnose [RFC2923]. Thus, mechanisms to be used in place of (or, in addition to) IPv4 path MTU discovery are required and are specified in this document. 2. Static MTU Determination IPv6-in-IPv4 tunnels that do not implement a dynamic MTU determination mechanism should use an interface LinkMTU ([RFC2461], section 6.3.2) that is no larger than the smallest known Effective MTU to Receive (EMTU_R) ([RFC1122], section 3.3.2) for all potential decapsulators, while also allowing room for sub link-layer encapsulations (e.g., VPN) that may occur along the path. The tunnel encapsulator sends all IPv6 packets that are no larger than LinkMTU with the DF bit not set in the encapsulating IPv4 header. Other packets are discarded, with an ICMPv6 "packet too big" message returned to the sender. LinkMTU MUST be at least 1280 bytes (the minimum IPv6 MTU [RFC2460]) and should otherwise be chosen to minimize IPv4 fragmentation within the tunnel. For example, when all potential decapsulators are known to have an EMTU_R of 10KB and at most 100bytes of sub link-layer encapsulation is expected on the path to any potential decapsulator, the encapsulator may use LinkMTU values as large as (10KB - 100 - 20), where 20 bytes is the overhead for IPv6-in-IPv4 encapsulation. (When excessive IPv4 fragmentation is anticipated/measured along some paths, LinkMTU should be set to an even smaller value.) All IPv6 packets with multicast/anycast destination addresses use the static MTU determination specified above. 3. Dynamic MTU Determination IPv6-in-IPv4 tunnels appear as a single hop at the IPv6 level with (possibly) multiple IPv4 hops occurring inside the tunnel. IPv4 path MTU discovery [RFC1191] uses ICMPv4 Destination Unreachable messages with code 4 (fragmentation needed and DF set) ([RFC0792], p. 4) which Templin Expires April 14, 2004 [Page 3] Internet-Draft Tunnel MTU October 2003 may not provide enough information for stateless translation to ICMPv6 "packet too big" ([RFC1812], section 4.3.2.3). Additionally, ICMPv4 "fragmentation needed" messages can be spoofed, filtered, or not sent at all by some forwarding nodes. Thus, use of IPv4 path MTU discovery alone may result in black holes that are difficult to diagnose [RFC2923]. When IPv4 path MTU discovery used alone is deemed inadequate, dynamic tunnel MTU determination may be implemented as follows: 3.1 Interface Initialization The encapsulator sets LinkMTU for the tunnel interface to the MTU of the underlying IPv4 link, minus overhead for IPv6-in-IPv4 encapsulation. If the tunnel is configured over multiple underlying IPv4 links, the largest of the underlying link MTUs is used, and locally-generated ICMPv6 "packet too big" messages may result from within the tunnel interface. The encapsulator additionally keeps a link layer cache of per-neighbor MTU values, e.g., as ancillary data in the IPv6 neighbor cache, in the IPv4 path MTU discovery cache, etc. 3.2 Tunnel Endpoint Negotiation When the encapsulator has a large IPv6 packet to send to a decapsulator for which there is no per-neighbor cache state, it may perform a Router Solicitation (RS)/Router Advertisement (RA) exchange as in ([RFC2461], sections 6.3.7 and 6.3.4). If the decapsulator returns an RA message containing an MTU option, the MTU value is used as the initial MTU estimate and the encapsulator records this value in the per-neighbor MTU cache described above. 3.3 Encapsulator/Decapsulator Actions 3.3.1 Method 1 - Fragmentation Sensing In this method, the encapsulator and decapsulator both participate in the protocol. The initial RS/RA exchange serves as a "contract" by which the decapsulator certifies that the protocol is supported. If the decapsulator does not return an RA message, or if it returns an RA message that does not contain an MTU option, the encapsulator must assume that the protocol is not supported and either use the static MTU determination specified in Section 2 or the MTU probing method specified in Section 3.3.2 for this decapsulator. Otherwise, the fragmentation sensing method MAY be used as specified below: Templin Expires April 14, 2004 [Page 4] Internet-Draft Tunnel MTU October 2003 3.3.1.1 Encapsulator Actions The encapsulator sends packets with the DF bit NOT set in the encapsulating IPv4 header with the expectation that the decapsulator will send an unsolicited RA message with an MTU option if IPv4 fragmentation occurs inside the tunnel. The encapsulator can probe the tunnel path MTU by null-padding ordinary IPv6 data packets by artificially inflating the length field in the encapsulating IPv4 header ([MECH], section 3.5), i.e., by setting it to a larger value than the IPv6 length plus 20 bytes. When the encapsulator receives an unsolicited RA message with an MTU option, it records the value in the per-neighbor MTU cache described in Section 3.2. 3.3.1.2 Decapsulator Actions The decapsulator monitors fragmented IPv4 packets arriving from the tunnel. If a fragmented packet arrives, the decapsulator sends an unsolicited RA message to the encapsulator with an MTU option to inform the encapsulator of a new MTU value. The new MTU value is chosen such that subsequent packets from the encapsulator will not incur IPv4 fragmentation. If the IPv6 packet is larger than the largest IPv4 fragment (i.e., if fragmentation was NOT due to padding bytes), the decapsulator also discards the packet and sends an ICMPv6 "packet too big" message to the source. This action is taken even if the decapsulator could correctly reassemble the packet, since IPv6 packets are not permitted to be fragmented by the network. 3.3.2 Method 2 - MTU Probing When the decapsulator does not implement the fragmentation sensing scheme specified in Section 3.3.1, the encapsulator can use the MTU probing method specified below: In this method, the encapsulator sets the DF bit in the IPv4 header of probe packets. Probe packets may be sent either when the encapsulator forwards a large data packet to the decapsulator (i.e., on-demand) or when the path MTU for the decapsulator has not been verified for some time (i.e., periodic). IPv6 Neighbor Solicitation (NS) or ICMPv6 ECHO_REQUEST packets with padding bytes added (i.e., by artificially setting the IPv4 length field to a larger value than the IPv6 length plus 20 bytes) can be used for this purpose, since successful delivery results in a positive acknowledgement that the probe succeeded vis-a-vis a response from the decapsulator. Templin Expires April 14, 2004 [Page 5] Internet-Draft Tunnel MTU October 2003 While probing, the encapsulator MAY maintain a queue of packets that use the decapsulator as the IPv6 next-hop. If the probe succeeds, packets in the queue that are no larger than the probe size are sent to the decapsulator. If the probe fails, packets that are larger than the last known successful probe are dropped and an ICMPv6 "packet too big" message returned to the sender [RFC1981]. If used, the queue should be large enough to buffer the (delay*bandwidth) product for the round-trip time to the decapsulator. When a smaller queue (or, no queue at all) is used, loss of packets that are too big for the yet-to-be-determined path MTU may occur with no ICMPv6 "packet too big" message returned. Such loss may occur only in rare instances, but may result in unpredictable behavior in senders that base their adaptation solely on ICMPv6 "packet too big" messages. 3.4 Packet Handling As in the static case, IPv6 packets that are larger than the tunnel interface LinkMTU are discarded with an ICMPv6 "packet too big" message returned to the sender. In the dynamic case, the encapsulator also performs these actions for IPv6 packets that are larger than the MTU value for the decapsulator in the neighbor cache. 4. Additional Notes 1. A "MinMTU" value must be supported by all nodes for multicast/ anycast using the static MTU determination scheme. (The dynamic MTU determination scheme applies only for unicast.) MinMTU is configured such that: 1280 <= MinMTU <= LinkMTU. 2. The specifications above are easily extended to include differentiated service (DS) information to provide accurate MTU estimates when multipath routing is used. 3. To avoid superfluous probing based on counting down/up by small increments, plateau tables (e.g., [RFC1191], section 7) should be used for selecting the probe packet size for the MTU probing method specifies in Section 3.3.2, or for reporting an MTU value in "packet too big" messages when the actual MTU value is indeterminant. 4. Some forwarding nodes may have broken/non-existent IPv4 fragmentation implementations. Thus, the fragmentation-sensing method described above may be susceptible to black holes along some forwarding paths. The encapsulator can avoid black holes by periodically sending locally-fragmented probe packets to elicit responses from decapsulators. 5. When the fragmentation sensing scheme is used, some Templin Expires April 14, 2004 [Page 6] Internet-Draft Tunnel MTU October 2003 implementations may wish to send only probe packets with the DF bit not set and all other packets with the DF bit *set* in order to generate IPv4 "fragmentation needed" messages from the network. This method has the advantage that packets that are too big are discarded before reaching the decapsulator, but is susceptible to black holes as for IPv4 path MTU discovery. 6. RFC 1191 path MTU discovery and the dynamic MTU determination mechanisms specified in Section 3 MAY be used alone, or in combination with one another. The method for combining the mechanisms is up to the implementation. 5. IANA Considerations N/A 6. Security considerations Security issues are the same as for IPv6 neighbor discovery; works-in-progress from the IETF SEcuring NEighbor Discovery (SEND) working group may provide solutions. Neighbor discovery messages used to convey MTU values SHOULD include an IP authentication header. 7. Acknowledgements Most of the ideas expressed in this document are not new and borrow from earlier mailing list discussions. The fragmentation-sensing method (Section 3.3.1) was inspired by a scheme proposed by Charles Lynn on the TCP-IP discussion list in November 1987. The MTU probing method (Section 3.3.2) bears close relation to a scheme proposed by Dave Borman on the IPng mailing list in August 1999. Other ideas in the draft may have borrowed to some extent from discussions on the IETF MTU Discovery WG mailing list from November 1989 - February 1995 and discussions on the IETF NGTRANS WG mailing list in August 2002. The author would like to acknowledge certain individuals for helpful discussion on this subject, including Ralph Droms, Tim Gleeson, Jun-ichiro itojun Hagino, Bob Hinden, Christian Huitema, Kevin Lahey, Matt Mathis, Jeff Mogul, Erik Nordmark, Dave Thaler, Lixia Zhang and the members of the Nokia NRC/COM Mountain View team. Normative References [MECH] Gilligan, R. and E. Nordmark, "Basic Transition Mechanisms for IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2-00 Templin Expires April 14, 2004 [Page 7] Internet-Draft Tunnel MTU October 2003 (work in progress), February 2003. [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, September 1981. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995. [RFC1981] McCann, J., Deering, S. and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC2461] Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998. Informative References [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923, September 2000. Author's Address Fred L. Templin Nokia 313 Fairchild Drive Mountain View, CA 94110 US Phone: +1 650 625 2331 EMail: ftemplin@iprg.nokia.com Templin Expires April 14, 2004 [Page 8] Internet-Draft Tunnel MTU October 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Templin Expires April 14, 2004 [Page 9] Internet-Draft Tunnel MTU October 2003 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Templin Expires April 14, 2004 [Page 10]