Network Working Group F. Templin, Ed. Internet-Draft Boeing Research & Technology Intended status: Informational May 22, 2012 Expires: November 23, 2012 Generic Tunnel MTU Determination draft-generic-v6ops-tunmtu-00.txt Abstract The IPv6-over-IPv4 tunnel MTU as defined by "Basic Transition Mechanisms for IPv6" is currently recommended to be set to 1480 or less when static MTU determination is used. This is to avoid IPv4 fragmentation within the tunnel, but requires the tunnel ingress to drop any IPv6 packet larger than 1480 bytes and return an ICMPv6 Packet Too Big (PTB) message. Concerns for operational issues with both IPv4 and IPv6 Path MTU Discovery point to the possibility of MTU-related black holes when a packet is dropped due to an MTU restriction. Fortunately, the "Internet cell size" is 1500 bytes, i.e., the minimum MTU configured by the vast majority of links in the Internet. This document therefore presents a method to boost the tunnel MTU to 1500 bytes. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 23, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Templin Expires November 23, 2012 [Page 1] Internet-Draft Generic Tunnel MTU May 2012 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 3 3. Tunnel MTU and Fragmentation . . . . . . . . . . . . . . . . . 4 4. Setting the Identification Field . . . . . . . . . . . . . . . 5 5. Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 6 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 6 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 9.1. Normative References . . . . . . . . . . . . . . . . . . . 7 9.2. Informative References . . . . . . . . . . . . . . . . . . 7 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 Templin Expires November 23, 2012 [Page 2] Internet-Draft Generic Tunnel MTU May 2012 1. Introduction The IPv6-over-IPv4 tunnel MTU as defined by "Basic Transition Mechanisms for IPv6" is currently recommended to be set to 1480 or less when static MTU determination is used [RFC4213]. This is to avoid IPv4 fragmentation within the tunnel [RFC0791], but requires the tunnel ingress to drop any IPv6 packet larger than 1480 bytes and return an ICMPv6 Packet Too Big (PTB) message [RFC2460]. Concerns for operational issues with both IPv4 and IPv6 Path MTU Discovery (PMTUD) [RFC1191][RFC1981] point to the possibility of MTU-related black holes when a packet is dropped due to an MTU restriction. Fortunately, the "Internet cell size" is 1500 bytes, i.e., the minimum MTU configured by the vast majority of links in the Internet. This document therefore presents a method to boost the tunnel MTU to 1500 bytes. Pushing the tunnel MTU to 1500 bytes is met with the challenge that the addition of the IPv4 encapsulation header would cause a 1500 byte IPv6 packet to appear as a 1520 byte IPv4 packet on the wire. This can result in the packet being either fragmented or dropped by an IPv4 router that configures a 1500 byte link, depending on the setting of the "Don't Fragment" (DF) bit in the IPv4 header. Using the approach outlined in this document, the tunnel ingress avoids this issue by performing IPv6 fragmentation on the inner IPv6 packet before outer IPv4 encapsulation. The approach is outlined in the following sections. 2. Problem Statement When an IPv6 tunnel configures a smaller MTU than 1500 bytes, packets that are small enough to traverse earlier links in the path toward the final destination can suddenly be dropped at the tunnel ingress with an ICMPv6 PTB message returned to the original source. However, operational experience has shown that the PTB messages can be lost in the network due to filtering in which case the source does not receive notification of the loss. It is therefore highly desirable that the tunnel configure an MTU of at least 1500 bytes. One possibility is to use IP fragmentation of the outer IP layer protocol so that inner packets up to 1500 bytes are delivered even if the tunnel encapsulation causes the outer packet to be larger than 1500 bytes. However, IPv4 fragmentation has been shown to be dangerous at high data rates due to the 16-bit Identification field wrapping while reassemblies are still active. Also, if outer IPv4 fragmentation were used the tunnel egress would need to do the reassembly which can be an onerous burden when the egress is located on a router. Templin Expires November 23, 2012 [Page 3] Internet-Draft Generic Tunnel MTU May 2012 A second possibility is to enable IP PMTUD on the outer packet. However, the PTB messages that may result could either be lost on the return path to the tunnel ingress or may not contain enough information for translation into an inner packet PTB for delivery to the original source. Still another possibility is for the tunnel ingress to maintain state about MTU sizes for various tunnel egresses, but this becomes unwieldy when the number of egresses is large. In short, PMTUD for both IPv4 and IPv6 is a mess and new approaches are needed. Preferably, PMTUD can be avoided through operational arrangements, as described in the following sections. 3. Tunnel MTU and Fragmentation Section 3.2 of [RFC4213] presents both static and dynamic MTU determination algorithms. These algorithms have been shown to be problematic in many instances, as discussed in Section 2. This document therefore proposes a new tunnel MTU and fragmentation method via the following algorithm: 1. set the tunnel ingress MTU to 1500 2. for IPv6 packets to be admitted into the tunnel, do the following: a) if the packet is 1501 or more, drop it and send an ICMPv6 PTB with MTU size 1500 b) if the packet is between 1281 - 1500: - admit the packet into the tunnel, subject to rate limiting (see Section 3). - break the packet into 2 pieces, where the first piece is a random length between 512-1024 bytes - insert a fragment header on both pieces and set the Identification field to a 32-bit value as specified in Section 4. (If the original packet already included a fragment header, however the original fragment header is used.) - encapsulate both pieces in an IPv4 header and send them to the tunnel far end. c) if the packet is 1280 or less: - admit the packet into the tunnel - encapsulate the packet in an IPv4 header and send it to the tunnel far end 3. the IPv6 destination host gets to reassemble if necessary In the above algorithm, the tunnel MTU can be set to 1500 bytes Templin Expires November 23, 2012 [Page 4] Internet-Draft Generic Tunnel MTU May 2012 independently of other nodes, i.e., the approach is incrementally deployable. 4. Setting the Identification Field The algorithm in Section 3 ignores the IPv6 requirement that routers in the network must not fragment IPv6 packets. However, we observe that the 32-bit Identification field provides sufficient protection against accidental reassembly of fragments from different IPv6 packets if operational considerations are observed. Specifically, the tunnel ingress must ensure that there will be no IPv6 fragments alive in the system with duplicate Identification values. Since [RFC2460] specifies that the maximum time a node may retain an incomplete fragmented packet is 60 seconds, this means that the tunnel ingress must not allow the Identification values to be repeated within this timeframe. The tunnel ingress can therefore calculate a maximum data rate for admission of fragmented packets into the tunnel. To avoid Identification value duplication, the tunnel ingress must admit no more than (2^32 / 60) = 71582788 IPv6 packets requiring fragmentation into the tunnel per second. In the worst case, consider that each packet is 1281 bytes (i.e., 10248 bits) in length. The tunnel ingress can then calculate the maximum data rate as (71582788 * 10248) = 733580411424 bits/sec, or approximately 733 Gbps. It is therefore essential that the tunnel ingress set a rate limit to no more than 733 Gbps for those packets that will require fragmentation. This restriction can be relaxed if the tunnel ingress maintains a per-egress Identification value instead of a single Identification value for all tunnel egresses. The algorithm in Section 3 requires that the tunnel ingress perform IPv6 fragmentation (when necessary), but the tunnel egress does not perform reassembly; hence, the tunnel endpoints remain stateless. Instead, the final destination IPv6 host may be obliged to reassemble IPv6 packets up to 1500 bytes in length, but hosts are already required to reassemble at least that much by the IPv6 standard [RFC2460]. Note that a possible conflict exists when IPv6 fragmentation has already been performed by a source host before the fragments arrive at the tunnel ingress. In that case, there is a small possibility that the Identification values used by the source host will temporarily be in close correlation with those used by the tunnel ingress, where a "collision" may occur in which the fragments produced by the source host have the same Identification value as the Templin Expires November 23, 2012 [Page 5] Internet-Draft Generic Tunnel MTU May 2012 fragments produced by the tunnel ingress. Two factors that mitigate such conflicts are the random length of the first fragment used by the tunnel ingress (i.e., to cause a length mismatch for colliding reassemblies) and, in even rarer instance, the use of the TCP/UDP checksum. For that reason, the Identification values chosen for the fragmented tunneled packets are *not* assigned sequentially to avoid corrupting a block of reassemblies at a single destination host. Instead, the Identification values must be spaced so that the value begins counting at a random variable N (between 0 - (2^32 - 1)), then (N + C), then (N + 2C), then (N + 3C) etc. modulo 2^32 until the count returns again to N. The sequence then begins at (N+1), then ((N+1) + C), then ((N+1) + 2C), etc. until all integer values within 2^32 are covered. Using this method, large numbers of outstanding reassemblies can be accommodated with only very rare instances of packet loss. 5. Applicability This approach applies to all tunneling methods that use the basic transition mechanisms, including configured tunnels [RFC4213], 6to4 [RFC3056], ISATAP [RFC5214], DSMIP [RFC5555], 6rd [RFC5969], etc. Note that this same approach can also be applied to tunneling methods that use other than the basic transition mechanisms. These can include Teredo [RFC4380], LISP [I-D.ietf-lisp], SEAL [I-D.templin-intarea-seal], and even IPv6-over-IPv6 encapsulation [RFC2473] 6. IANA Considerations There are no IANA considerations for this document. 7. Security Considerations The security considerations for basic transition mechanisms apply also to this document. 8. Acknowledgments This method was inspired through discussion on the IETF v6ops list in the May 2012 timeframe. Templin Expires November 23, 2012 [Page 6] Internet-Draft Generic Tunnel MTU May 2012 9. References 9.1. Normative References [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for IPv6 Hosts and Routers", RFC 4213, October 2005. 9.2. Informative References [I-D.ietf-lisp] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "Locator/ID Separation Protocol (LISP)", draft-ietf-lisp-23 (work in progress), May 2012. [I-D.templin-intarea-seal] Templin, F., "The Subnetwork Encapsulation and Adaptation Layer (SEAL)", draft-templin-intarea-seal-42 (work in progress), December 2011. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in IPv6 Specification", RFC 2473, December 1998. [RFC3056] Carpenter, B. and K. Moore, "Connection of IPv6 Domains via IPv4 Clouds", RFC 3056, February 2001. [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through Network Address Translations (NATs)", RFC 4380, February 2006. [RFC5214] Templin, F., Gleeson, T., and D. Thaler, "Intra-Site Automatic Tunnel Addressing Protocol (ISATAP)", RFC 5214, March 2008. [RFC5555] Soliman, H., "Mobile IPv6 Support for Dual Stack Hosts and Routers", RFC 5555, June 2009. Templin Expires November 23, 2012 [Page 7] Internet-Draft Generic Tunnel MTU May 2012 [RFC5969] Townsley, W. and O. Troan, "IPv6 Rapid Deployment on IPv4 Infrastructures (6rd) -- Protocol Specification", RFC 5969, August 2010. Author's Address Fred L. Templin (editor) Boeing Research & Technology P.O. Box 3707 Seattle, WA 98124 USA Email: fltemplin@acm.org Templin Expires November 23, 2012 [Page 8]