Network Working Group                                    F. Templin, Ed.
Internet-Draft                              Boeing Research & Technology
Intended status: Informational                              May 24, 2012
Expires: November 25, 2012


                    Generic Tunnel MTU Determination
                   draft-generic-v6ops-tunmtu-01.txt

Abstract

   The tunnel MTU for popular IP-in-IP tunneling mechanisms is currently
   recommended to be set to 1500 (or less) minus the length of the
   encapsulation headers when static MTU determination is used.  This is
   to avoid IP fragmentation within the tunnel, but requires the tunnel
   ingress to either fragment any IP packet larger than the MTU or drop
   the packet and return an ICMP Packet Too Big (PTB) message.  Concerns
   for operational issues with both IPv4 and IPv6 Path MTU Discovery
   point to the possibility of MTU-related black holes when a packet is
   dropped due to an MTU restriction.  Fortunately, the "Internet cell
   size" is 1500 bytes, i.e., the minimum MTU configured by the vast
   majority of links in the Internet.  We also note that these same
   considerations apply to the encapsulation of any combination of IP-
   within-IP protocol versions.  This document therefore presents a
   method to boost the tunnel MTU to larger values.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 25, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Templin                 Expires November 25, 2012               [Page 1]

Internet-Draft             Generic Tunnel MTU                   May 2012


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  Tunnel MTU and Fragmentation  . . . . . . . . . . . . . . . . . 4
   4.  Setting the Identification Field  . . . . . . . . . . . . . . . 5
   5.  Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 6
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
   7.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 7
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 7
     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 7
     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 7
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 8


Templin                 Expires November 25, 2012               [Page 2]

Internet-Draft             Generic Tunnel MTU                   May 2012


1.  Introduction

   The tunnel MTU for popular IP-in-IP tunneling mechanisms is currently
   recommended to be set to 1500 (or less) minus the length of the
   encapsulation headers when static MTU determination is used.  This is
   to avoid IP fragmentation within the tunnel [RFC0791][RFC2460], but
   requires the tunnel ingress to either fragment any IP packet larger
   than the MTU or drop the packet and return an ICMP Packet Too Big
   (PTB) message.  Concerns for operational issues with both IPv4 and
   IPv6 Path MTU Discovery [RFC1191][RFC1981] point to the possibility
   of MTU-related black holes when a packet is dropped due to an MTU
   restriction.  Fortunately, the "Internet cell size" is 1500 bytes,
   i.e., the minimum MTU configured by the vast majority of links in the
   Internet.  We also note that these same considerations apply to the
   encapsulation of any combination of IP-within-IP protocol versions.
   This document therefore presents a method to boost the tunnel MTU to
   larger values.

   Pushing the tunnel MTU to 1500 bytes or beyond is met with the
   challenge that the addition of the IP encapsulation header would
   cause an inner IP packet that is slightly less than 1500 bytes to
   appear as a 1501 byte or larger outer IP packet on the wire.  This
   can result in the packet being either fragmented or dropped by a
   router that configures a 1500 byte link.  Using the approach outlined
   in this document, the tunnel ingress avoids this issue by performing
   IP fragmentation on the inner packet before outer IP encapsulation.
   The approach is outlined in the following sections.


2.  Problem Statement

   When an IP tunnel configures a smaller MTU than 1500 bytes, packets
   that are small enough to traverse earlier links in the path toward
   the final destination can suddenly be dropped at the tunnel ingress
   with an ICMP PTB message returned to the original source.  However,
   operational experience has shown that the PTB messages can be lost in
   the network due to filtering in which case the source does not
   receive notification of the loss.  It is therefore highly desirable
   that the tunnel configure an MTU of at least 1500 bytes.

   One possibility is to use IP fragmentation of the outer IP layer
   protocol so that inner packets up to 1500 bytes are delivered even if
   the tunnel encapsulation causes the outer packet to be larger than
   1500 bytes.  However, IPv4 fragmentation has been shown to be
   dangerous at high data rates due to the 16-bit Identification field
   wrapping while reassemblies are still active.  Also, if outer IP
   fragmentation were used the tunnel egress would need to do the
   reassembly which can be an onerous burden when the egress is located


Templin                 Expires November 25, 2012               [Page 3]

Internet-Draft             Generic Tunnel MTU                   May 2012


   on a router.

   A second possibility is to enable IP PMTUD on the outer packet.
   However, the PTB messages that may result could either be lost on the
   return path to the tunnel ingress or may not contain enough
   information for translation into an inner packet PTB for delivery to
   the original source.  Still another possibility is for the tunnel
   ingress to maintain state about MTU sizes for various tunnel
   egresses, but this becomes unwieldy when the number of egresses is
   large.

   In short, PMTUD for both IPv4 and IPv6 is a mess and new approaches
   are needed.  Preferably, PMTUD can be avoided through operational
   arrangements, as described in the following sections.


3.  Tunnel MTU and Fragmentation

   Section 3.2 of [RFC4213] presents both static and dynamic MTU
   determination algorithms.  These algorithms have been shown to be
   problematic in many instances, as discussed in Section 2.  This
   document therefore proposes a new and generic tunnel MTU and
   fragmentation method via the following algorithm:


      1. set the tunnel ingress MTU to "infinity"
      2. for IP packets to be admitted into the tunnel:
         a) if the packet is 1501 or more, admit it into the
            tunnel if it is an IPv6 packet or an IPv4 packet
            with DF=1. Otherwise (i.e., for an IPv4 packet
            with DF=0) break it into N pieces, where each
            piece is a random length between 512-1024 bytes.
         b) if the packet is between 1281 - 1500:
            - admit the packet into the tunnel, subject to
              rate limiting (see Section 3).
            - break the packet into 2 pieces, where the first
              piece is a random length between 512-1024 bytes
            - insert a fragment header on both pieces and set
              the Identification as specified in Section 4.
            - encapsulate both pieces in an IP header and
              send them to the tunnel far end.
         c) if the packet is 1280 or less:
            - admit the packet into the tunnel
     3. the IP destination gets to reassemble if necessary

   In the above algorithm, the tunnel MTU can be set to infinity
   independently of other nodes, i.e., the approach is incrementally
   deployable.  Note also that in clauses 2 b) and 2 c), the tunnel sets


Templin                 Expires November 25, 2012               [Page 4]

Internet-Draft             Generic Tunnel MTU                   May 2012


   DF=0 for IPv4 packets.


4.  Setting the Identification Field

   The algorithm in Section 3 ignores the requirement that routers in
   the network must not fragment inner IPv6 packets or inner IPv4
   packets with DF=1.  In the case of IPv6, the use of fragmentation
   requires that the tunnel ingress insert an IPv6 fragment header on
   each fragment.  In the case of IPv4 packets with DF=1, the use of
   fragmentation requires that the tunnel ingress rewrite the value in
   the Identification field.  In both cases, we observe that the
   Identification field provides sufficient protection against
   accidental reassembly of fragments from different IP packets given
   careful operational considerations.

   Specifically, the tunnel ingress must ensure that there will be no IP
   fragments alive in the system with duplicate Identification values.
   Since [RFC2460] specifies that the maximum time a node may retain an
   incomplete fragmented packet is 60 seconds, this means that the
   tunnel ingress must not allow the Identification values to be
   repeated within this timeframe.  The tunnel ingress can therefore
   calculate a maximum data rate for admission of fragmented packets
   into the tunnel.

   For IPv4, to avoid Identification value duplication the tunnel
   ingress must admit no more than (2^16 / 60) = 1092 IPv4 packets
   requiring fragmentation into the tunnel per second.  In the worst
   case, consider that each packet is 1281 bytes (i.e., 10248 bits) in
   length.  The tunnel ingress can then calculate the maximum data rate
   as (1092 * 10248) = 11190816 bits/sec, or approximately 11 Mbps.  It
   is therefore essential that the tunnel ingress set a rate limit to no
   more than 11 Mbps for those IPv4 packets that will require
   fragmentation.  This restriction can be relaxed if the tunnel ingress
   maintains a per-destination Identification value instead of a single
   Identification value for all destinations.

   For IPv6, to avoid Identification value duplication the tunnel
   ingress must admit no more than (2^32 / 60) = 71582788 IPv6 packets
   requiring fragmentation into the tunnel per second.  In the worst
   case, consider that each packet is 1281 bytes (i.e., 10248 bits) in
   length.  The tunnel ingress can then calculate the maximum data rate
   as (71582788 * 10248) = 733580411424 bits/sec, or approximately 733
   Gbps.  It is therefore essential that the tunnel ingress set a rate
   limit to no more than 733 Gbps for those IPv6 packets that will
   require fragmentation.  This restriction can be relaxed if the tunnel
   ingress maintains a per-destination Identification value instead of a
   single Identification value for all destinations.


Templin                 Expires November 25, 2012               [Page 5]

Internet-Draft             Generic Tunnel MTU                   May 2012


   Note that a possible conflict exists when IP fragmentation has
   already been performed by a source host before the fragments arrive
   at the tunnel ingress.  In that case, there is a small possibility
   that the Identification values used by the source host will
   temporarily be in close correlation with those used by the tunnel
   ingress, where a "collision" may occur in which the fragments
   produced by the source host have the same Identification value as the
   fragments produced by the tunnel ingress.  Two factors that mitigate
   such conflicts are the random length of the first fragment used by
   the tunnel ingress (i.e., to cause a length mismatch for colliding
   reassemblies) and, in even rarer instance, the use of the TCP/UDP
   checksum.

   For that reason, the Identification values chosen for the fragmented
   tunneled packets are *not* assigned sequentially to avoid corrupting
   a block of reassemblies at a single destination host.  Instead, let L
   be the length (in bits) of the Identification field, let N be a
   random value between (0, (2^L -1)) and let C be a bounded constant
   value (e.g., between (512, 1024)).  Then, set each successive
   Identification value to N, then (N + C), then (N + 2C), then (N + 3C)
   etc. modulo 2^L until the count returns again to N. The sequence then
   begins at (N+1), then ((N+1) + C), then ((N+1) + 2C), etc. until all
   integer values within 2^L are covered.  Using this method, large
   numbers of outstanding reassemblies can be accommodated with only
   very rare instances of reassembly misassociation.


5.  Applicability

   This approach applies to all tunneling methods that use the basic
   transition mechanisms, including configured tunnels [RFC4213], 6to4
   [RFC3056], ISATAP [RFC5214], DSMIP [RFC5555], 6rd [RFC5969], etc.

   Note that this same approach can also be applied to tunneling methods
   that use other than the basic transition mechanisms.  These can
   include Teredo [RFC4380], LISP [I-D.ietf-lisp], SEAL
   [I-D.templin-intarea-seal], and IPv6-over-IPv6 encapsulation
   [RFC2473], and even IPv4-over-IPv6 encapsulation [RFC6333]


6.  IANA Considerations

   There are no IANA considerations for this document.


7.  Security Considerations

   The security considerations for basic transition mechanisms apply


Templin                 Expires November 25, 2012               [Page 6]

Internet-Draft             Generic Tunnel MTU                   May 2012


   also to this document.


8.  Acknowledgments

   This method was inspired through discussion on the IETF v6ops list in
   the May 2012 timeframe.


9.  References

9.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

9.2.  Informative References

   [I-D.ietf-lisp]
              Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
              "Locator/ID Separation Protocol (LISP)",
              draft-ietf-lisp-23 (work in progress), May 2012.

   [I-D.templin-intarea-seal]
              Templin, F., "The Subnetwork Encapsulation and Adaptation
              Layer (SEAL)", draft-templin-intarea-seal-42 (work in
              progress), December 2011.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2473]  Conta, A. and S. Deering, "Generic Packet Tunneling in
              IPv6 Specification", RFC 2473, December 1998.

   [RFC3056]  Carpenter, B. and K. Moore, "Connection of IPv6 Domains
              via IPv4 Clouds", RFC 3056, February 2001.

   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
              for IPv6 Hosts and Routers", RFC 4213, October 2005.

   [RFC4380]  Huitema, C., "Teredo: Tunneling IPv6 over UDP through
              Network Address Translations (NATs)", RFC 4380,


Templin                 Expires November 25, 2012               [Page 7]

Internet-Draft             Generic Tunnel MTU                   May 2012


              February 2006.

   [RFC5214]  Templin, F., Gleeson, T., and D. Thaler, "Intra-Site
              Automatic Tunnel Addressing Protocol (ISATAP)", RFC 5214,
              March 2008.

   [RFC5555]  Soliman, H., "Mobile IPv6 Support for Dual Stack Hosts and
              Routers", RFC 5555, June 2009.

   [RFC5969]  Townsley, W. and O. Troan, "IPv6 Rapid Deployment on IPv4
              Infrastructures (6rd) -- Protocol Specification",
              RFC 5969, August 2010.

   [RFC6333]  Durand, A., Droms, R., Woodyatt, J., and Y. Lee, "Dual-
              Stack Lite Broadband Deployments Following IPv4
              Exhaustion", RFC 6333, August 2011.


Author's Address

   Fred L. Templin (editor)
   Boeing Research & Technology
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fltemplin@acm.org


Templin                 Expires November 25, 2012               [Page 8]