Network Working Group                                    F. Templin, Ed.
Internet-Draft                                      Boeing Phantom Works
Expires: December 19, 2005                                 June 17, 2005


                Link Adaptation for IPv6-in-IPv4 Tunnels
                     draft-templin-linkadapt-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 19, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   IPv6-in-IPv4 tunneling mechanisms support the minimum IPv6 MTU of
   1280 bytes via static prearrangements at the tunnel encapsulator
   and/or dynamic MTU determination based on ICMPv4 messages, but these
   methods have known operational limitations.  This document proposes a
   new MTU determination mechanism for IPv6-in-IPv4 tunnels that uses a
   link adaptation scheme with simplified IPv4 segmentation/reassembly
   and dynamic segment size probing.


Templin                 Expires December 19, 2005               [Page 1]

Internet-Draft         Link Adaptation for Tunnels             June 2005


1.  Introduction

   IPv6-in-IPv4 tunnels span multiple IPv4 network hops yet are seen by
   IPv6 as ordinary links that must support the minimum IPv6 link MTU of
   1280 bytes ([RFC2460], section 5).  Common tunneling mechanisms
   (e.g., [RFC2529][RFC3056][ISATAP][MECH][TEREDO]) meet this
   requirement through conservative static prearrangements at the
   encapsulator at the expense of sub-optimal performance over some
   paths due to excessive IPv4 network-based fragmentation and/or missed
   opportunities to discover larger MTUs.  Optional dynamic MTU
   determination methods based on ICMPv4 "fragmentation needed" messages
   are also available, but can result in MTU-related communication
   failures due to the unreliable and untrustworthy nature of ICMPv4
   messages generated by network middleboxes.

   This document proposes a link adaptation method for IPv6-in-IPv4
   tunnels that presents an assured MTU to the IPv6 layer.  It uses
   simplified segmentation/reassembly and dynamic segment size probing
   with authenticated probe feedback.  Thus, it provides greater
   robustness and efficiency than existing schemes by avoiding IPv4
   network-based fragmentation and reducing dependence on unreliable/
   untrustworthy ICMPv4 feedback from IPv4 network middleboxes.


2.  Requirements

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].


3.  Link Adaptation for IPv6-in-IPv4 Tunnels

   The following subsections specify a link adaptation scheme for IPv6-
   in-IPv4 tunnels with properties similar to those defined for AAL5
   [RFC2684] and IEEE 802.11 [WLAN]:

3.1.  Layering

   IPv6-in-IPv4 tunneling mechanisms that implement the link adaptation
   specified in this document (hereafter referred to as
   "implementations") operate at a logical midpoint between the IPv6 and
   IPv4 protocol modules.  From the viewpoint of IPv6, the
   implementation appears as a network driver that delivers whole Upper
   Layer Payloads (ULPs) to an underlying transmission media.  From the
   viewpoint of IPv4, the implementation appears as a packetization
   layer protocol (e.g., similar to TCP, etc.) that segments user data
   to be encapsulated in IPv4 packets.


Templin                 Expires December 19, 2005               [Page 2]

Internet-Draft         Link Adaptation for Tunnels             June 2005


3.2.  Tunnel Interface MTU

   Implementations MUST configure a minimum per-tunnel interface LinkMTU
   of 1280 bytes and SHOULD provide a configuration knob to set larger
   values.  A maximum LinkMTU of 9180 bytes (i.e., the same as defined
   in [RFC1626]) is RECOMMENDED for normal use cases, since it is large
   enough to encode 8KB network filesystem blocks and take advantage of
   Gigabit Ethernet Jumbo Frames, yet not so large as to diminsh the
   effectiveness of 32-bit link layer CRCs [GIGE].  Implementations MAY
   set even larger LinkMTU values, but are advised that this may lead to
   unacceptable levels of undetected errors unless all physical segments
   in the path can provide assured error-free deliverey for large
   packets.

   Since LinkMTU values larger than 1280 bytes may result in [ICMPv6]
   "packet too big" messages due to temporary segmentation restrictions
   (see: section 3.3), ULPs SHOULD employ a probing strategy that begins
   with a smaller payload size (on the order of 1KB) and probes upward
   [PMTUD].  (Note that this may not be possible for some ULPs.)

3.3.  Encapsulation/Segmentation

   Encapsulators cache per-flow segment sizes ("SEGSIZE") for the
   purpose of segmenting ULPs into chains of IPv4 datagrams.
   Conservative implementations can configure an initial SEGSIZE of 68
   bytes minus the length of the IPv4 header and any additional
   encapsulation headers, since the minimum IPv4 LinkMTU is 68 bytes
   [RFC0791].  In practice, however, most Internet links configure much
   larger IPv4 LinkMTUs [RFC3150][RFC3819] such that larger initial
   SEGSIZE values are often possible.

   The encapsulator splits each ULP into a chain of at most 32 segments
   for presentation to the IPv4 layer.  The segments MUST be contiguous
   and non-overlapping, i.e., the final byte of the (i)th segment MUST
   be the byte that immediately precedes the first byte of the (i+1)th
   segment.  Non-final segments in the chain MUST be equal in size; the
   final segment MAY be of different size.  For ULPs that span multiple
   segments, encapsulators use 2's compliment Fletcher-32
   [STONE][RFC3385] to calculate a checksum across all ULP payload bytes
   and record the A and B results in a trailing 32-bit checksum.  For
   ULPs that fit within a single segment, the trailing 32-bit checksum
   is omitted.

   Segments are encapsulated in-order in consecutive IPv4 packets with
   bit 1 of the "Flags" field (i.e., "Don't Fragment - DF") set to '1'
   and an increasing Segment ID ("SEGID") value between 0 - 31 encoded
   in the five low-order bits in the "Fragmentation Offset" field, i.e.,
   the first packet encodes '0', the second packet encodes '1', etc.


Templin                 Expires December 19, 2005               [Page 3]

Internet-Draft         Link Adaptation for Tunnels             June 2005


   Each packet in the chain except the final one sets the "More
   Fragments - MF" bit, i.e., the MF bit is set as for ordinary IPv4
   fragmentation.  Each packet in the chain is delivered to the link
   layer (i.e., the IPv4 stack) in increasing SEGID order, i.e., SEGID 0
   first, followed by SEGID 1, etc., up to the final packet; the link
   layer SHOULD NOT reorder the packets or introduce artificial delays
   between packets.

   Implementations MAY increase a flow's SEGSIZE to larger values
   through path probing to avoid black holes [RFC2923].  Implementations
   probe a candidate SEGSIZE value 'N' by segmenting a ULP into a chain
   of two or more packets such that the final packet encapsulates a
   segment of size N, where N is larger than the size of the segments
   encapsulated in non-final packets.  The chain SHOULD also include
   Forward Error Correction (FEC) information (format and encoding TBD)
   that covers the probe segment in case of loss.  If the encapsulator
   receives a unicast IPv6 Router Advertisement message [RFC2461] from
   the decapsulator at the far end of the tunnel (see: section 3.4) with
   an MTU option that encodes the value N within a maximum probedelay
   ("MaxProbeDelay") timeout period, it deems the probe successful.

   Following a successful probe, but before advancing SEGSIZE to N,
   implementations SHOULD enter a brief verification phase during which
   additional probes are sent to detect asymmetric multipath MTU
   restrictions.  Thereafter, implementations SHOULD re-probe
   periodically to confirm that packets with up to SEGSIZE byte segments
   are still reaching the decapsulator at the far end of the tunnel.
   Additional strategies for SEGSIZE management and black hole detection
   are found in [PMTUD].

3.4.  Decapsulation/Reassembly

   The Length, SEGID, MF and flow identification information in the
   encapsulation headers of packets in a chain provide sufficient
   information for the tunnel decapsulator to reassemble the original
   ULP with protection for packet reordering in the IPv4 network.
   Decapsulators MUST configure per-flow reassembly buffers of at least
   1280 bytes and SHOULD configure larger per-flow reassembly buffers up
   to 9180 bytes or larger (see: section 3.2).

   Decapsulators use per-flow reassembly buffers to concatenate the ULP
   segments received in packet chains in increasing SEGID order (i.e.,
   SEGID 0, followed by SEGID 1, etc.) even if the packets were re-
   ordered by the network.  When all ULP segments have been concatenated
   into the reassembly buffer, the decapsulator uses 2's complement
   Fletcher-32 to detect errors if a trailing checksum was included
   (see: section 3.3).


Templin                 Expires December 19, 2005               [Page 4]

Internet-Draft         Link Adaptation for Tunnels             June 2005


   If the decapsulator receives a packet chain that would overflow the
   reassembly buffer, it discards the chain and sends an [ICMPv6]
   "packet too big" message back to the source.  The message body
   includes upper layer packet headers (IPv6 and above) and contents of
   the reassembly buffer up to a total of 1280 bytes, while the MTU
   value encodes the reassembly buffer size.

   If at least one segment was received, but one or more segments were
   lost and/or checksum verification failed, the decapsulator SHOULD
   send an [ICMPv6] "parameter problem" message with code "reassembly/
   checksum error" back to the encapsulator at the originating end of
   the tunnel.  The message body includes upper layer packet headers
   (IPv6 and above) and contents of the reassembly buffer up to a total
   of 1280 bytes, and the pointer identifies either the beginning of the
   first missing segment or the beginning of the 4 byte checksum field
   (if no segments were missing).  Upon receipt of such [ICMPv6] errors,
   the encapsulator SHOULD take appropriate corrective actions such as
   reduce the tunnel's current SEGSIZE, impose an artifical inter-ULP
   queuing delay for the tunnel, relay the [ICMPv6] messages back to the
   original source as a congestion indication, etc.

   When a decapsulator receives a packet chain used for probing (see:
   section 3.3), it reassembles the ULP as above and sends a unicast
   IPv6 Router Advertisement message back to the encapsulator at the
   originating end of the tunnel with an MTU option that encodes the
   size of the segment encapsulated in the final packet in the chain.
   The encapsulator will receive the Router Advertisement and deem the
   probe successful.

   Following successful reassembly, the trailing checksum is discarded
   (if present) and the ULP payload is delivered to upper layers.

3.5.  ICMPv4 Error Handling

   Encapsulators may receive ICMPv4 "fragmentation needed" error
   messages from inside a tunnel due to probe failures and/or route
   changes across previously-probed paths.  These messages may come from
   either legitimate IPv4 network middleboxes or adversarial/
   mis-configured middleboxes that return wrong information.
   Implementers are advised to consult [PMTUD] for operational
   recommendations on processing ICMPv4 "fragmentation needed" messages.


4.  IANA Considerations

   The IANA is instructed to assign a code type for "reassembly/checksum
   error" under the [ICMPv6] Parameter Problem message type in the
   "ICMPv6 Type Numbers" registry.


Templin                 Expires December 19, 2005               [Page 5]

Internet-Draft         Link Adaptation for Tunnels             June 2005


5.  Security Considerations

   The securing mechanisms for IPv6 neighbor discovery [RFC3971] and
   Cryptographically-Generated Addresses [RFC3972] are used to
   authenticate Router Advertisement probe responses.


6.  Acknowledgments

   This document represents the mindshare of many contributers.


7.  Appendix A: Additional Considerations

   Encapsulators can segment chains of two or more packets in which the
   final packet is longer than the non-final packets as a general-
   purpose mechanism for eliciting acknowledgements from the reassembler
   if improved reliability at the expense of additional overhead is
   desired.  The equal size restriction for non-final segments and non-
   overlapping restriction for all segments in packet chains provides a
   significant simplification for reassembly algorithms [RFC0815].

   Use of the link adaptation scheme described in this document may lead
   to an overall increase in short chains of small packets in the
   Internet.  Network administrators are advised to follow the
   recommendations in [RFC3150] to minimize packet loss and packet
   reordering.

   Network middleboxes that do not honor the IPv4 DF bit will cause
   irreparable damage to the information encoded in the IPv4 headers of
   encapsulated packets if fragmentation is incurred.

   Network conditions such as load balancing, multi-path routing,
   spanning tree reconfigurations, etc. can cause a certain degree of
   reordering of the packets in a flow.  For instance, Segment 5 of a
   segmented PDU could arrive before Segment 1.  The 5-bit segment ID in
   each packet provides protection for reordering among the packets of
   the same PDU, but provides no protection for reordering of packets
   belonging to *different* PDUs.  A small ID field is therefore needed
   in each packet to differentiate the packets of PDUs A and B. The
   question arises as to whether a very small (2-4 bit) ID field is
   enough to eliminate potential ambiguity due to packet reordering in
   the network.  Several works conducted by CAIDA (www.caida.org) may
   provide insights.

   Since link-layer CRC-32 checks normally occur on each segment in the
   path, most errors detected during PDU reassembly will be due to
   packet splices and/or errors in the data path between the NIC


Templin                 Expires December 19, 2005               [Page 6]

Internet-Draft         Link Adaptation for Tunnels             June 2005


   hardware and the reassembly buffer.  The Fletcher-32 checksum
   algorithm has been shown to provide an effective edge-to-edge error
   detection capability for such errors [STONE].  The Fletcher-32
   checksum is also dissimilar from both CRC-32 and the Internet
   checksum used by many upper layer protocols, thereby decreasing the
   likelihood of undetected errors.

   Prior to any path MTU probing for a flow, link adaptation should
   begin with a conservative initial SEGSIZE to yield an IPv4 packet
   size of 68 bytes (the maximum IPv4 packet size guaranteed to fit over
   any link in the IPv4 Internet without incurring fragmentation) so
   that an un-probed ULP payload of at least 1280 bytes will be assured
   for ultra-conservative implementations.  But, [RFC3150] suggests a
   minimum MTU of 296 bytes over the slowest serial links, so a slightly
   more optimistic implementation could send ULP payloads as large as
   ((296 - encapsulation_header_length) * 32) ~= 9000 bytes (and perhaps
   a bit larger due to VJ header compression) as long as they arrange
   for the first few such payloads to generate probe responses from the
   far-end.  For those optimistic implementations, if probe responses
   consistently arrive after an initial probe and subsequent
   verification phase, the flow's SEGSIZE can be advanced to the size
   used for probing.  Otherwise, the interface can generate IPv6 "packet
   too big" messages to inform upper packetization layers that smaller
   IPv6 packets should be sent over this flow for the time being.  An
   optimistic implementation could therefore set the maximum interface
   LinkMTU of 9180 bytes and perform the optimistic initial probing
   described above.

   Some upper layer packetization protocols (e.g., NFS) generate fixed
   payload sizes and rely on the network layer to deliver the payloads
   either as whole IP packets or as chains of IP fragments.  Those
   protocols should consider "packet too big" messages coming from the
   interface as an indication to retransmit, since the IP fragmentation
   layer will have been informed of the smaller MTU for the flow.
   Subsequent payloads sent over the flow will therefore undergo IP
   fragmentation and each fragment will be presented to the interface
   for transmission.  Since NFS performance (and the performance of
   other upper layer packetization protocols) is highly sensitive to
   packet handling overhead, implementations should periodically attempt
   to increase the SEGSIZE through probing even if initial probe
   attempts fail.

   Since the RTT paths along various paths may vary from the sub-
   microsecond level up to hundreds of milliseconds or more, Forward
   Error Correction (FEC) will clearly be required in some cases (i.e.,
   instead of Automatic Repeat Request (ARQ)) even though efficiency may
   suffer [RFC3819].  Provisions for enabling adaptive and efficient FEC
   in the segmentation/reassembly procedures are FFS.


Templin                 Expires December 19, 2005               [Page 7]

Internet-Draft         Link Adaptation for Tunnels             June 2005


8.  References

8.1.  Normative References

   [ICMPV6]   Conta, A., Deering, S., and M. Gupta, ed., "Internet
              Control Message Protocol (ICMPv6) for the Internet
              Protocol Version 6 (IPv6) Specification",
              draft-ietf-ipngwg-icmp-v3 (work in progress),
              November 2004.

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

   [RFC2461]  Narten, T., Nordmark, E., and W. Simpson, "Neighbor
              Discovery for IP Version 6 (IPv6)", RFC 2461,
              December 1998.

   [RFC3971]  Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure
              Neighbor Discovery (SEND)", RFC 3971, March 2005.

   [RFC3972]  Aura, T., "Cryptographically Generated Addresses (CGA)",
              RFC 3972, March 2005.

8.2.  Informative References

   [FRAG]     Mogul, J. and C. Kent, "Fragmentation Considered Harmful,
              In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
              Communications Technology.", August 1987.

   [GIGE]     Dykstra, P., "Gigabit Ethernet Jumboframes (And Why You
              Should Care), http://sd.wareonearth.com/~phil/jumbo.html",
              December 1999.

   [ISATAP]   Templin, F., Gleeson, T., Talwar, M., and D. Thaler,
              "Intra-Site Automatic Tunnel Addressing Protocol
              (ISATAP)", draft-ietf-ngtrans-isatap (work in progress),
              January 2005.

   [MECH]     Nordmark, E. and R. Gilligan, "Transition Mechanisms for
              IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2 (work in
              progress), March 2005.


Templin                 Expires December 19, 2005               [Page 8]

Internet-Draft         Link Adaptation for Tunnels             June 2005


   [PMTUD]    Mathis, M., Heffner, J., and K. Lahey, "Path MTU
              Discovery", draft-ietf-pmtud-method (work in progress),
              February 2005.

   [RFC0815]  Clark, D., "IP datagram reassembly algorithms", RFC 815,
              July 1982.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1626]  Atkinson, R., "Default IP MTU for use over ATM AAL5",
              RFC 1626, May 1994.

   [RFC2529]  Carpenter, B. and C. Jung, "Transmission of IPv6 over IPv4
              Domains without Explicit Tunnels", RFC 2529, March 1999.

   [RFC2684]  Grossman, D. and J. Heinanen, "Multiprotocol Encapsulation
              over ATM Adaptation Layer 5", RFC 2684, September 1999.

   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
              RFC 2923, September 2000.

   [RFC3056]  Carpenter, B. and K. Moore, "Connection of IPv6 Domains
              via IPv4 Clouds", RFC 3056, February 2001.

   [RFC3150]  Dawkins, S., Montenegro, G., Kojo, M., and V. Magret,
              "End-to-end Performance Implications of Slow Links",
              BCP 48, RFC 3150, July 2001.

   [RFC3385]  Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna,
              "Internet Protocol Small Computer System Interface (iSCSI)
              Cyclic Redundancy Check (CRC)/Checksum Considerations",
              RFC 3385, September 2002.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [STONE]    Stone, J., "Checksums in the Internet (Stanford Doctoral
              Dissertation)", August 2001.

   [TEREDO]   Huitema, C., "Teredo: Tunneling IPv6 over UDP through
              NATs", draft-huitema-v6ops-teredo (work in progress),
              April 2005.

   [WLAN]     Society, I., "Part 11: Wireless LAN Medium Access Control
              (MAC) and Physical Layer (PHY) Specifications, IEEE


Templin                 Expires December 19, 2005               [Page 9]

Internet-Draft         Link Adaptation for Tunnels             June 2005


              Computer Society, ANSI/IEEE 802.11, 1999 Edition.".


Templin                 Expires December 19, 2005              [Page 10]

Internet-Draft         Link Adaptation for Tunnels             June 2005


Author's Address

   Fred Lambert Templin (editor)
   Boeing Phantom Works
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fred.l.templin@boeing.com


Templin                 Expires December 19, 2005              [Page 11]

Internet-Draft         Link Adaptation for Tunnels             June 2005


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Templin                 Expires December 19, 2005              [Page 12]