Network Working Group                                    F. Templin, Ed.
Internet-Draft                              Boeing Research & Technology
Intended status: Informational                             June 23, 2012
Expires: December 25, 2012


                    Generic Tunnel MTU Determination
                   draft-generic-v6ops-tunmtu-07.txt

Abstract

   The Maximum Transmission Unit (MTU) for popular IP-within-IP tunnels
   is currently recommended to be set to 1500 (or less) minus the length
   of the encapsulation headers when static MTU determination is used.
   This requires the tunnel ingress to either fragment any IP packet
   larger than the MTU or drop the packet and return an ICMP Packet Too
   Big (PTB) message.  Concerns for operational issues with Path MTU
   Discovery (PMTUD) point to the possibility of MTU-related black holes
   when a packet is dropped due to an MTU restriction.  The current
   "Internet cell size" is effectively 1500 bytes (i.e., the minimum MTU
   configured by the vast majority of links in the Internet) and should
   therefore also be the minimum MTU assigned to tunnels, but the
   desired end state is full accommodation of MTU diversity.  This
   document therefore presents a method to boost the tunnel MTU to
   larger values.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 25, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Templin                 Expires December 25, 2012               [Page 1]

Internet-Draft             Generic Tunnel MTU                  June 2012


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  Tunnel MTU  . . . . . . . . . . . . . . . . . . . . . . . . . . 4
   4.  Packet Admission Algorithm  . . . . . . . . . . . . . . . . . . 5
   5.  Tunnel Fragmentation  . . . . . . . . . . . . . . . . . . . . . 6
   6.  Outer Packet Fragmentation  . . . . . . . . . . . . . . . . . . 6
   7.  En Route Fragmentation  . . . . . . . . . . . . . . . . . . . . 7
   8.  Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 7
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
   10. Security Considerations . . . . . . . . . . . . . . . . . . . . 8
   11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 8
   12. References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
     12.1.  Normative References . . . . . . . . . . . . . . . . . . . 8
     12.2.  Informative References . . . . . . . . . . . . . . . . . . 8
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 9


Templin                 Expires December 25, 2012               [Page 2]

Internet-Draft             Generic Tunnel MTU                  June 2012


1.  Introduction

   The Maximum Transmission Unit (MTU) for popular IP-within-IP tunnels
   is currently recommended to be set to 1500 (or less) minus the length
   of the encapsulation headers when static MTU determination is used.
   This requires the tunnel ingress to either fragment any IP packet
   larger than the MTU or drop the packet and return an ICMP Packet Too
   Big (PTB) message [RFC0791][RFC2460].  Concerns for operational
   issues with Path MTU Discovery (PMTUD) [RFC1191][RFC1981] point to
   the possibility of MTU-related black holes when a packet is dropped
   due to an MTU restriction.  The current "Internet cell size" is
   effectively 1500 bytes (i.e., the minimum MTU configured by the vast
   majority of links in the Internet) and should therefore also be the
   minimum MTU assigned to tunnels, but the desired end state is full
   accommodation of MTU diversity.  This document therefore presents a
   method to boost the tunnel MTU to larger values.

   Pushing the tunnel MTU to 1500 bytes or beyond is met with the
   challenge that the addition of encapsulation headers would cause an
   inner IP packet that is 1500 byte (or slightly less) to appear as a
   slightly larger than 1500 byte outer IP packet on the wire, where it
   may be too large to traverse a link on the path in one piece.  One
   alternative is to perform IP fragmentation on the outer IP packet
   following encapsulation, however existing tunneling protocols do not
   require the egress to reassemble packets as large as 1500 bytes plus
   the size of the encapsulation headers.  The tunnel ingress therefore
   has no way of knowing whether the egress can reassemble larger sizes.
   In the case of fragmentable packets, the tunnel ingress can instead
   perform IP fragmentation on the inner packet before encapsulating
   each fragment in outer headers.  Considerations for both inner and
   outer fragmentation are presented in the following sections.  A third
   alternative known as tunnel fragmentation is also given.


2.  Problem Statement

   Existing tunneling protocols have by and large relied on PMTUD in
   order to provide necessary packet size feedback to the original
   source.  When an IP tunnel configures an MTU smaller than 1500 bytes,
   packets that are small enough to traverse earlier links in the path
   toward the final destination may be dropped at the tunnel ingress
   with a PTB message returned to the original source.  However,
   operational experience has shown that the PTB messages can be lost in
   the network due to filtering in which case the source does not
   receive notification of the loss.  It is therefore highly desirable
   that the tunnel configure an MTU of at least 1500 bytes, even though
   encapsulation would cause the tunneled packet to be larger than 1500
   bytes.


Templin                 Expires December 25, 2012               [Page 3]

Internet-Draft             Generic Tunnel MTU                  June 2012


   One possibility is to use IP fragmentation of the outer IP layer
   protocol so that inner packets up to 1500 bytes are delivered even if
   the tunnel encapsulation causes the outer packet to be larger than
   1500 bytes.  However, IPv4 fragmentation has been shown to be
   dangerous at high data rates due to the Identification field wrapping
   while reassemblies are still active [RFC4963].  Also, if outer IP
   fragmentation were used the tunnel ingress has no assurance that the
   egress can reassemble packets larger than 1500 bytes.

   A second possibility is to enable PMTUD on the outer packet.
   However, the PTB messages that may result could either be lost on the
   return path to the tunnel ingress or may not contain enough
   information for translation into an inner packet PTB for delivery to
   the original source.  Still another possibility is for the tunnel
   ingress to maintain state about MTU sizes for various tunnel
   egresses, but this becomes unwieldy when the number of egresses is
   large.

   In short, PMTUD for existing tunneling protocols is a mess and a new
   approach is needed.


3.  Tunnel MTU

   Section 3.2 of [RFC4213] presents both static and dynamic MTU
   determination algorithms.  Similar algorithms appear in other
   tunneling mechanisms.  These algorithms have been shown to be
   problematic in many instances, as discussed in Section 2.

   The desired end state is for tunnels to support assured delivery of
   packets that are no larger than 1500 bytes while admitting larger
   packets into the tunnel without explicit assurances of delivery.
   Hosts should therefore set a tunnel ingress MTU of at least 1500
   bytes, but should take care to not set so large an MTU that
   applications would be delayed by excessive PMTUD messages.  Routers
   should instead set a constant value "HLEN" to the length of the
   encapsulation headers, then set an indefinite tunnel MTU, with a
   maximum upper bound of ((2^32 -1 ) - HLEN) for tunnels over IPv6 and
   ((2^16 - 1) - HLEN) for tunnels over IPv4.

   This document therefore proposes a generic MTU determination method
   suitable for all tunnel types.  In particular, the tunnel ingress
   admits inner packets into the tunnel based on their size, and may
   need to use inner packet fragmentation, outer packet fragmentation
   and/or tunnel fragmentation as necessary.  The following sections
   discuss considerations for the approaches.


Templin                 Expires December 25, 2012               [Page 4]

Internet-Draft             Generic Tunnel MTU                  June 2012


4.  Packet Admission Algorithm

   The tunnel ingress uses an algorithm for admitting packets of various
   sizes into the tunnel.  In this algorithm, the ingress sets a
   constant value "HLEN" to the length of the encapsulation headers, and
   sets "MINMTU" to 576 for tunnels over IPv4 or 1280 for tunnels over
   IPv6.  The algorithm used is as follows:

     1) if the packet is larger than 1500:
       a. if the packet is an atomic packet (*) admit it
          into the tunnel if it is no larger than the MTU
          of the underlying interface to be used to carry
          the tunneled packet; otherwise, drop the packet
          and return a PTB message.
       b. if the packet is not an atomic packet, break it
          into N roughly equal-length pieces (where N is
          minimized and each piece is at most (MINMTU-HLEN)
          bytes) and admit each piece into the tunnel.
     2) if the packet is larger than (MINMTU-HLEN) but no
         larger than 1500:
       a. if the packet is an atomic packet, admit it into
          the tunnel and use either tunnel fragmentation or
          outer fragmentation to fragment it into roughly
          equal length pieces that are no larger than
          (MINMTU-HLEN). Also, for IPv6, return a PTB message
          with MTU set to (MINMTU-HLEN).
       b. if the packet is not an atomic packet, break it
          into N roughly equal length pieces (where N is
          minimized and each piece is at most (MINMTU-HLEN)
          bytes) and admit each piece into the tunnel.
     3) if the packet is (MINMTU-HLEN) or less:
       a. admit the packet into the tunnel

   (*) An "atomic packet" is an IPv6 packet that does not contain a
   fragment header, or an IPv4 packet with (DF=1 && MF=0 && Offset=0)
   [I-D.ietf-intarea-ipv4-id-update].

   In the above algorithm, clause 1a.) requires that large atomic
   packets not be subject to reassembly at the tunnel egress.  Instead,
   the tunnel ingress should process any PTB messages returned by the
   tunnel and translate them into a corresponding PTB message to return
   to the original source.  (The size 1500 is chosen with the
   expectation that hosts that send packets larger than this also used
   host-based MTU determination, e.g., per [RFC4821].)

   In clause 2a.), the ingress must use either tunnel fragmentation
   (see: Section 5) or outer fragmentation (see: Section 6) to break the
   inner packet into pieces that are no larger than (MINMTU-HLEN) before


Templin                 Expires December 25, 2012               [Page 5]

Internet-Draft             Generic Tunnel MTU                  June 2012


   encapsulating each piece and sending them to the tunnel egress.  For
   IPv6 inner packets, the ingress then also returns a PTB message with
   MTU set to (MINMTU-HLEN) as an indication to the source host that it
   must begin including a fragment header in the packets it sends (see
   Section 5 of [RFC2460]).

   Clauses 1b.) and 2b.) in the algorithm perform inner fragmentation
   using the Identification value already present in the packet header.
   The maximum fragment size of (MINMTU-HLEN) is chosen so that further
   fragmentation within the tunnel will not occur.  The tunnel ingress
   then admits each fragment into the tunnel unconditionally, since it
   is the original source (and not the tunnel ingress) that asserts the
   uniqueness of the packet's Identification value.

   Finally, clause 3a.) admits packets that are no larger than (MINMTU-
   HLEN) into the tunnel without need for tunnel or outer fragmentation.


5.  Tunnel Fragmentation

   Clause 2a.) in the algorithm of Section 4 requires the tunnel ingress
   to perform either tunnel fragmentation or outer fragmentation, with
   tunnel fragmentation preferred when it is available.  Tunnel
   fragmentation requires separate packet Identification and
   segmentation control bits in a mid-layer of encapsulation that is
   added between the inner and outer IP headers.  As for outer
   fragmentation, the tunnel egress is responsible for reassembly.

   Tunnel fragmentation can be particularly useful for tunnels over
   IPv4, since the mid-layer encapsulation can include an extended
   Identification field that avoids the identification wrapping issues
   seen for IPv4 fragmentation [RFC4963].  Furthermore, when tunnel
   fragmentation is used the tunnel ingress has assurance that the
   egress has a minimum reassembly buffer size ("MINMRU") of at least
   (1500 + HLEN) bytes since both the ingress and egress are required to
   implement the scheme.

   An example of tunnel fragmentation appears in SEAL
   [I-D.templin-intarea-seal].


6.  Outer Packet Fragmentation

   When tunnel fragmentation cannot be used, clause 2a.) in the
   algorithm of Section 4 requires the tunnel ingress to perform outer
   fragmentation.  As for tunnel fragmentation, reassembly is performed
   by the tunnel egress.


Templin                 Expires December 25, 2012               [Page 6]

Internet-Draft             Generic Tunnel MTU                  June 2012


   Using outer fragmentation, the ingress splits the inner packet into
   pieces that are no larger than (MINMTU-HLEN) bytes then adds the
   encapsulation headers.  The tunnel ingress must however take care to
   ensure that the egress has a large enough MINMRU in order to
   reassemble packets up to (1500+HLEN) bytes.  If this cannot be
   ensured, the ingress must instead drop the packet and return a PTB
   message with MTU size (MINMRU-HLEN).

   When outer fragmentation is needed, the ingress must also ensure that
   it does not admit more packets into the tunnel than would cause the
   Identification value in the outer header to wrap while other
   fragmented packets may still be awaiting reassembly.  See
   [I-D.ietf-intarea-ipv4-id-update] for considerations for this
   requirement.

   Implementations can choose to ignore the outer fragmentation rule if
   there is reasonable assurance that all links on the path between the
   ingress and egress configure a sufficiently-large MTU.  This approach
   is seen in widely-deployed tunnels over IPv4, but runs the risk of
   path MTU-related black holes and/or reassembly collisions when link
   MTUs cannot be controlled.


7.  En Route Fragmentation

   Following any inner, tunnel or outer fragmentation, the ingress must
   allow the encapsulated packets or fragments to be further fragmented
   by a router on the path that configures a link with a too-small MTU.
   These fragments would be reassembled by the tunnel egress the same as
   if the fragmentation occurred within the tunnel ingress.


8.  Applicability

   This approach applies to existing IPv6-within-IPv4 transition
   mechanisms, including configured tunnels [RFC4213], 6to4 [RFC3056],
   ISATAP [RFC5214], DSMIP [RFC5555], 6rd [RFC5969], etc.

   This same approach can further be applied to existing IP-within-IP
   tunneling mechanisms of all varieties, including GRE [RFC1701], IPv4-
   in-IPv4 [RFC2003], IPv6-in-IPv6 [RFC2473], IPv4-in-IPv6 [RFC6333],
   IPsec [RFC4301], Teredo [RFC4380], etc.


9.  IANA Considerations

   There are no IANA considerations for this document.


Templin                 Expires December 25, 2012               [Page 7]

Internet-Draft             Generic Tunnel MTU                  June 2012


10.  Security Considerations

   The security considerations for the various tunneling mechanisms
   apply also to this document.


11.  Acknowledgments

   This method was inspired through discussion on the IETF v6ops and
   NANOG mailing lists in the May/June 2012 timeframe.


12.  References

12.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

12.2.  Informative References

   [I-D.ietf-intarea-ipv4-id-update]
              Touch, J., "Updated Specification of the IPv4 ID Field",
              draft-ietf-intarea-ipv4-id-update-05 (work in progress),
              May 2012.

   [I-D.templin-intarea-seal]
              Templin, F., "The Subnetwork Encapsulation and Adaptation
              Layer (SEAL)", draft-templin-intarea-seal-42 (work in
              progress), December 2011.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1701]  Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic
              Routing Encapsulation (GRE)", RFC 1701, October 1994.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
              October 1996.

   [RFC2473]  Conta, A. and S. Deering, "Generic Packet Tunneling in
              IPv6 Specification", RFC 2473, December 1998.


Templin                 Expires December 25, 2012               [Page 8]

Internet-Draft             Generic Tunnel MTU                  June 2012


   [RFC3056]  Carpenter, B. and K. Moore, "Connection of IPv6 Domains
              via IPv4 Clouds", RFC 3056, February 2001.

   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
              for IPv6 Hosts and Routers", RFC 4213, October 2005.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

   [RFC4380]  Huitema, C., "Teredo: Tunneling IPv6 over UDP through
              Network Address Translations (NATs)", RFC 4380,
              February 2006.

   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", RFC 4821, March 2007.

   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
              Errors at High Data Rates", RFC 4963, July 2007.

   [RFC5214]  Templin, F., Gleeson, T., and D. Thaler, "Intra-Site
              Automatic Tunnel Addressing Protocol (ISATAP)", RFC 5214,
              March 2008.

   [RFC5555]  Soliman, H., "Mobile IPv6 Support for Dual Stack Hosts and
              Routers", RFC 5555, June 2009.

   [RFC5969]  Townsley, W. and O. Troan, "IPv6 Rapid Deployment on IPv4
              Infrastructures (6rd) -- Protocol Specification",
              RFC 5969, August 2010.

   [RFC6333]  Durand, A., Droms, R., Woodyatt, J., and Y. Lee, "Dual-
              Stack Lite Broadband Deployments Following IPv4
              Exhaustion", RFC 6333, August 2011.


Author's Address

   Fred L. Templin (editor)
   Boeing Research & Technology
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fltemplin@acm.org


Templin                 Expires December 25, 2012               [Page 9]