Network Working Group                                    F. Templin, Ed.
Internet-Draft                                      Boeing Phantom Works
Intended status: Informational                            April 21, 2008
Expires: October 23, 2008


        The Subnetwork Encapsulation and Adaptation Layer (SEAL)
                       draft-templin-seal-10.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on October 23, 2008.

Abstract

   Subnetworks are connected network regions bounded by border routers
   that forward unicast and multicast packets over a virtual topology
   manifested by tunneling.  This virtual topology resembles a "virtual
   ethernet", but may span multiple IP- and/or sub-IP layer forwarding
   hops that can introduce packet duplication and/or traverse links with
   diverse Maximum Transmission Units (MTUs).  This document specifies a
   Subnetwork Encapsulation and Adaptation Layer (SEAL) that
   accommodates such virtual topologies over diverse underlying link
   technologies.


Templin                 Expires October 23, 2008                [Page 1]

Internet-Draft                    SEAL                        April 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology and Requirements . . . . . . . . . . . . . . . . .  4
   3.  Applicability Statement  . . . . . . . . . . . . . . . . . . .  5
   4.  SEAL Protocol Specification  . . . . . . . . . . . . . . . . .  5
     4.1.  Model of Operation . . . . . . . . . . . . . . . . . . . .  5
     4.2.  ITE Specification  . . . . . . . . . . . . . . . . . . . .  7
       4.2.1.  Tunnel Interface MTU . . . . . . . . . . . . . . . . .  7
       4.2.2.  SEAL Maximum Segment Size (S-MSS) Maintenance  . . . .  8
       4.2.3.  Inner Packet Fragmentation . . . . . . . . . . . . . .  8
       4.2.4.  SEAL Segmentation and Encapsulation  . . . . . . . . .  8
       4.2.5.  Sending SEAL packets . . . . . . . . . . . . . . . . . 10
       4.2.6.  Sending S-MSS Probes . . . . . . . . . . . . . . . . . 11
       4.2.7.  Processing Fragmentation Reports (FRAGREPs)  . . . . . 11
       4.2.8.  Processing ICMP PTBs . . . . . . . . . . . . . . . . . 12
     4.3.  ETE Specification  . . . . . . . . . . . . . . . . . . . . 12
       4.3.1.  Reassembly Buffer Requirements . . . . . . . . . . . . 12
       4.3.2.  IPv4-Layer Reassembly  . . . . . . . . . . . . . . . . 12
       4.3.3.  SEAL-Layer Reassembly  . . . . . . . . . . . . . . . . 13
       4.3.4.  Generating Fragmentation Reports (FRAGREPs)  . . . . . 13
   5.  Link Requirements  . . . . . . . . . . . . . . . . . . . . . . 14
   6.  End System Requirements  . . . . . . . . . . . . . . . . . . . 15
   7.  Router Requirements  . . . . . . . . . . . . . . . . . . . . . 15
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 15
   10. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 15
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 16
     11.2. Informative References . . . . . . . . . . . . . . . . . . 16
   Appendix A.  Historic Evolution of PMTUD (written 10/30/2002)  . . 18
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19
   Intellectual Property and Copyright Statements . . . . . . . . . . 20


Templin                 Expires October 23, 2008                [Page 2]

Internet-Draft                    SEAL                        April 2008


1.  Introduction

   As internet technology and communication has grown and matured, many
   techniques have developed that use virtual topologies (frequently
   tunnels of one form or another) over an actual IP network.  Those
   virtual topologies have elements which appear as one hop in the
   virtual topology, but are actually multiple IP or sub-IP layer hops.
   These multiple hops often have quite diverse properties which are
   often not even visible to the end-points of the virtual hop.  This
   introduces many failure modes that are not dealt with well in current
   approaches.

   The use of IP encapsulation has long been considered as an
   alternative for creating such virtual topologies.  However, the
   insertion of an outer IP header reduces the effective path MTU as-
   seen by the IP layer.  When IPv4 is used, this reduced MTU can be
   accommodated through the use of IPv4 fragmentation, but unmitigated
   in-the-network fragmentation has been shown to be harmful through
   operational experience and studies conducted over the course of many
   years [FRAG][FOLK][RFC4963].  Additionally, classical path MTU
   discovery [RFC1191] has known operational issues that are exacerbated
   by in-the-network tunnels [RFC2923][RFC4459].

   For the purpose of this document, subnetworks are defined as virtual
   topologies that span connected network regions bounded by border
   routers.  Examples include the global Internet interdomain routing
   core, Mobile Ad hoc Networks (MANETs) and enterprise networks.  These
   subnetworks are mainfested by tunnels that may span many underlying
   networks and traditional IP subnets, e.g., in the internal
   organization of an enterprise network.  Subnetwork border routers
   support the Internet protocols [RFC0791][RFC2460] and forward unicast
   and multicast IP packets over the virtual topology across multiple
   IP- and/or sub-IP layer forwarding hops which may introduce packet
   duplication and/or traverse links with diverse Maximum Transmission
   Units (MTUs).

   This document proposes a Subnetwork Encapsulation and Adaptation
   Layer (SEAL) for the operation of IP over subnetworks that connect
   the Ingress- and Egress Tunnel Endpoints (ITEs/ETEs) of border
   routers.  SEAL accommodates links with diverse MTUs and supports
   efficient duplicate packet detection by introducing a minimal mid-
   layer encapsulation.  The SEAL encapsulation introduces an extended
   Identification field for packet identification and a mid-layer
   segmentation and reassembly capability that allows simplified cutting
   and pasting of packets without invoking in-the-network IP
   fragmentation.  The SEAL protocol is specified in the following
   sections.


Templin                 Expires October 23, 2008                [Page 3]

Internet-Draft                    SEAL                        April 2008


2.  Terminology and Requirements

   The term "subnetwork" in this document refers to a virtual topology
   that is configured over a connected network region bounded by border
   routers and that that appears as a fully-connected shared link, i.e.,
   a "Virtual Ethernet (VET)" [I-D.templin-autoconf-dhcp].

   The terms "inner" and "outer" respectively refer to the innermost IP
   {layer, protocol, header, packet, etc.} *before* any encapsulation,
   and the outermost IP {layer, protocol, header, packet etc.} *after*
   any encapsulation.  Between these inner and outer layers, there may
   also be "mid-layer" encapsulations.

   The notation IPvX/*/IPvY refers to an inner IPvX packet encapsulated
   in any '*' mid-layer headers (including the SEAL header) followed by
   an outer IPvY header.  The notation "IP" means either IP protocol
   version (IPv4 or IPv6).

   The following abbreviations correspond to terms used within this
   document and elsewhere in common Internetworking nomenclature:

      Subnetwork - a connected network region bounded by border routers

      SEAL - Subnetwork Encapsulation and Adaptation Layer

      VET - Virtual EThernet

      MANET - Mobile Ad-hoc Network

      ITE - Ingress Tunnel Endpoint

      ETE - Egress Tunnel Endpoint

      ENCAPS - the size of the outer encapsulating SEAL/*/IPv4 headers

      MTU - Maximum Transmission Unit

      S-MSS - the per-ETE SEAL Maximum Segment Size

      PTB - an ICMPv6 "Packet Too Big" or an ICMPv4 "fragmentation
      needed" message

      DF - the IPv4 header Don't Fragment flag

      FRAGREP - a Fragmentation Report message

      SEAL-ID - a 32-bit Identification value; randomly initialized and
      monotonically incremented for each SEAL-encapsulated packet


Templin                 Expires October 23, 2008                [Page 4]

Internet-Draft                    SEAL                        April 2008


   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].


3.  Applicability Statement

   SEAL was motivated by the specific use case of subnetwork abstraction
   for MANETs, however the domain of applicability also extends to
   subnetwork abstractions of enterprise networks, the interdomain
   routing core, etc.  The domain of application therefore also includes
   the map-and-encaps architecture proposals in the IRTF Routing
   Research Group (RRG) (see: http://www3.tools.ietf.org/group/irtf/
   trac/wiki/RoutingResearchGroup).

   SEAL introduces a minimal new mid-layer for IPvX in IPvY
   encapsulation (e.g., as IPv6/SEAL/IPv4), and appears as a subnetwork
   encapsulation as seen by the inner IP layer.  SEAL can also be used
   as a mid-layer for encapsulating inner IP packets within outer UDP/
   IPv4 header (e.g., as IP/SEAL/UDP/IPv4) such as for the Teredo domain
   of applicability [RFC4380].  For further study, SEAL may also be
   useful for "transport-mode" applications, e.g., when the inner layer
   includes ordinary protocol data rather than an encapsulated IP
   packet.

   The current document version is specific to the use of IPv4 as the
   outer encapsulation layer, however the same principles apply when
   IPv6 is used as the outer layer.


4.  SEAL Protocol Specification

4.1.  Model of Operation

   Ingres Tunnel Endpoints (ITEs) insert a SEAL header in the IP/*/
   IPv4-encapsulated packets they inject into a subnetwork, where the
   outermost IPv4 header contains the source and destination addresses
   of the subnetwork entry/exit points (i.e., the ITE/ETE),
   respectively.  SEAL defines a new IP protocol type and a new mid-
   layer encapsulation for both unicast and multicast inner IP packets.
   The ITE inserts a SEAL header during encapsulation as shown in
   Figure 1:


Templin                 Expires October 23, 2008                [Page 5]

Internet-Draft                    SEAL                        April 2008


                                      +-------------------------+
                                      |                         |
                                      ~   Outer */IPv4 headers  ~
                                      |                         |
                                      +-------------------------+
                                      |       SEAL Header       |
   +-------------------------+        +-------------------------+
   ~ Any mid-layer * headers ~        ~ Any mid-layer * headers ~
   +-------------------------+        +-------------------------+
   |                         |        |                         |
   ~        Inner IP         ~  --->  ~        Inner IP         ~
   ~         Packet          ~  --->  ~         Packet          ~
   |                         |        |                         |
   +-------------------------+        +-------------------------+
   ~  Any mid-layer trailers ~        ~  Any mid-layer trailers ~
   +-------------------------+        +-------------------------+
                                      ~    Any outer trailers   ~
                                      +-------------------------+

                       Figure 1: SEAL Encapsulation

   where the SEAL header is inserted as follows:

   o  For simple IP/IPv4 encapsulations (e.g.,
      [RFC2003][RFC2004][RFC4213]), the SEAL header is inserted between
      the inner IP and outer IPv4 headers as: IP/SEAL/IPv4.

   o  For tunnel-mode IPsec encapsulations over IPv4, [RFC4301], the
      SEAL header is inserted between the {AH,ESP} header and outer IPv4
      headers as: IP/*/{AH,ESP}/SEAL/IPv4.

   o  For IP encapsulations over transports such as UDP, the SEAL header
      is inserted immediately after the outer transport layer header,
      e.g., as IP/*/SEAL/UDP/IPv4.

   SEAL-encapsulated packets include a 32-bit SEAL-ID formed from the
   concatenation of the 16-bit ID Extension field in the SEAL header as
   the most-significant bits, and with the 16-bit ID value in the outer
   IPv4 header as the least-significant bits.  Routers within the
   subnetwork use the SEAL-ID for duplicate packet detection, and ITEs/
   ETEs use the SEAL-ID for SEAL segmentation and reassembly.

   SEAL enables a multi-level segmentation and reassembly capability.
   First, the ITE can use IPv4 fragmentation to fragment inner IPv4
   packets with DF=0 before SEAL encapsulation to avoid lower-level
   segmentation and reassembly.  Secondly, the SEAL layer itself
   provides a simple mid-layer cutting-and-pasting of inner IP packets
   to avoid IPv4 fragmentation on the outer packet.  Finally, ordinary


Templin                 Expires October 23, 2008                [Page 6]

Internet-Draft                    SEAL                        April 2008


   IPv4 fragmentation is permitted on the outer packet after SEAL
   encapsulation and used to detect and dampen any in-the-network
   fragmentation as quickly as possible.

   The following sections specifiy the SEAL-related operations of the
   ITE and ETE, respectively:

4.2.  ITE Specification

4.2.1.  Tunnel Interface MTU

   The ITE configures a tunnel virtual interface over one or more
   underlying links that connect the border router to the subnetwork.
   The tunnel interface must present a fixed MTU to the inner IP layer
   (i.e., Layer 3) as the size for admission of inner IP packets into
   the tunnel.  Since the tunnel interface provides a virtual point-to-
   multipoint abstraction between the ITE and a potentially large set of
   ETEs, however, care must be taken in setting the MTU while still
   upholding end system expectations.

   Due to the ubiquitous deployment of standard Ethernet and similar
   networking gear, the nominal Internet cell size has become 1500
   bytes; this is the de facto size that end systems have come to expect
   will be delivered by the network without loss due to an MTU
   restriction on the path, or a suitable ICMP PTB message returned.
   However, the network may not always deliver the necessary PTBs,
   leading to MTU-related black holes [RFC2923].  The ITE therefore
   requires a means for conveying 1500 byte (or smaller) packets to the
   ETE without loss due to MTU restrictions and without dependence on
   PTB messages from within the subnetwork.

   In common deployments, there may be many forwarding hops between the
   original source and the ITE.  Within those hops, there may be
   additional encapsulations (IPSec, L2TP, etc.) such that a 1500 byte
   packet sent by the original source might grow to a larger size by the
   time it reaches the ITE for encapsulation as an inner IP packet, with
   (2KB-ENCAPS) serving as the nominal worst-case upper bound.
   Similarly, additional encapsulations on the path from the ITE to the
   ETE could cause the encapsulated packet to become larger still and
   trigger in-the-network fragmentation.  In order to preserve the end
   system expectation of delivery for 1500 byte and smaller original
   packets, the ITE therefore requires a means for conveying them to the
   ETE even though there may be links within the subnetwork that
   configure a smaller MTU.

   The ITE upholds the 1500-byte-and-smaller packet delivery expectation
   by setting a tunnel virtual interface MTU of 1500 bytes plus extra
   room to accommodate any additional encapsulations that may occur on


Templin                 Expires October 23, 2008                [Page 7]

Internet-Draft                    SEAL                        April 2008


   the path from the original source (i.e., even if the underlying links
   do not support an MTU of this size).  The ITE can set larger MTU
   values still (e.g., up to the maximum MTU size of the underlying
   links), but should select a value that is not so large as to cause
   excessive internally-generated ICMP PTBs coming from within the
   tunnel interface (see: Section 4.2.4).

4.2.2.  SEAL Maximum Segment Size (S-MSS) Maintenance

   The ITE maintains a SEAL Maximum Segment Size (S-MSS) value for each
   ETE as soft state within the tunnel interface (e.g., in the IPv4 path
   MTU discovery cache).  The ITE initializes S-MSS to the MTU of the
   underlying link minus ENCAPS, and decreases or increases S-MSS based
   on any Fragmentation Report (FRAGREP) messages received (see: Section
   4.2.7).

4.2.3.  Inner Packet Fragmentation

   The ITE performs inner packet fragmentation *before* it admits an
   inner packet into the tunnel interface.

   For inner IPv4 packets larger than 1500 bytes and with the IPv4 Don't
   Fragment (DF) bit set to 0, the ITE uses IPv4 fragmentation to break
   the packet into 1500 byte IPv4 fragments, with the final fragment
   possibly smaller than the first fragment.  The IPv4 layer then admits
   each fragment into the tunnel as an independent inner IPv4 packet.
   These IPv4 fragments will ultimately be reassembled by the final
   destination.  (Note that inner fragmentation may not be available for
   certain ITE types, e.g., for tunnel-mode IPsec.)

   For all other inner packets, the ITE admits the packet if it is no
   larger than the tunnel interface MTU; otherwise, it drops the packet
   and sends an ICMP PTB message to the source.

4.2.4.  SEAL Segmentation and Encapsulation

   The ITE performs SEAL segmentation and encapsulation *after* it
   admits an inner packet into the tunnel interface.

   For inner IP packets larger than (2KB-ENCAPS) and also larger than
   S-MSS, the ITE drops the packet and sends an ICMP PTB message back to
   the source.  Otherwise, the ITE encapsulates the packet in any mid-
   layer '*' headers (for '*' other than the SEAL header).  Next, if the
   inner IP packet plus '*' headers is larger than S-MSS the ITE breaks
   it into N segments (N <= 16) that are no larger than S-MSS bytes
   each.  Each segment except the final one MUST be of equal length,
   while the final segment MUST be no larger than the initial segment.
   The first byte of each segment MUST begin immediately after the final


Templin                 Expires October 23, 2008                [Page 8]

Internet-Draft                    SEAL                        April 2008


   byte of the previous segment, i.e., the segments MUST NOT overlap.

   Note that this SEAL segmentation and encapsulation ignores the DF bit
   in the inner IPv4 header or (in the case of IPv6) ignores the fact
   that the network is not permitted to perform IPv6 fragmentation.
   This segmentation process is a mid-layer (not an IP layer) operation
   employed by the ITE to adapt the inner IP packet to the subnetwork
   path characteristics, and the ETE will restore the inner packet to
   its original form during decapsulation.  Therefore, the fact that the
   packet may have been segmented within the subnetwork is not
   observable after decapsulation.

   The ITE encapsulates each segment in a SEAL header formatted as
   follows:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          ID Extension         |R|M|CTL|Segment|  Next Header  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 2: SEAL Header Format

   where the header fields are defined as follows:

   ID Extension (16)
      a 16-bit extension of the 16-bit ID field in the outer IPv4
      header; encodes the most-significant 16 bits of a 32 bit SEAL-ID
      value.

   R (1)
      Reserved.

   M (1)
      the "More Segments" bit.  Set to 1 if this SEAL-encapsulated
      packet contains a non-final segment of a multi-segment inner IP
      packet.

   CTL (2)
      a 2-bit "Control" field that identifies the type of SEAL-
      encapsulated packet as follows:

      '00' - a Fragmentation Report (FRAGREP).

      '01' - a non-probe.


Templin                 Expires October 23, 2008                [Page 9]

Internet-Draft                    SEAL                        April 2008


      '10' - an implicit probe.

      '11' - an explicit probe.

   Segment (4)
      a 4-bit Segment number.  Encodes a segment number between 0 - 15.

   Next Header (8)  an 8-bit field that encodes an IP protocol number
      the same as for the IPv4 protocol and IPv6 next header fields.

   For single-segment inner IP packets, the ITE encapsulates the segment
   in a SEAL header with (M=0; Segment=0).  For N-segment inner packets
   (N <= 16), the ITE encapsulates each segment in a SEAL header with
   (M=1; Segment=0) for the first segment, (M=1; Segment=1) for the
   second segment, etc., with the final segment setting (M=0;
   Segment=N-1).

   The ITE next sets CTL in the SEAL header of each segment as specified
   in Section 4.2.6, then writes the IP protocol number corresponding to
   the inner packet in the SEAL 'Next Header' field.  Finally, the ITE
   encapsulates the segment in the requisite */IPv4 outer headers
   according to the specific encapsulation format (e.g., [RFC2003],
   [RFC4213], etc.) then sets packet identification values as described
   below.

   For the purpose of packet identification, the ITE maintains a 32-bit
   SEAL-ID value as per-ETE soft state, e.g. in the IPv4 destination
   cache.  The ITE randomly-initializes SEAL-ID when the soft state is
   created and monotonically increments it (modulo 2^32) for each
   successive SEAL-encapsulated packet it sends to the ETE.  For each
   packet, the ITE writes the least-significant 16 bits of the SEAL-ID
   value in the ID field in the outer IPv4 header, and writes the most-
   significant 16 bits in the ID Extension field in the SEAL header.

   For tunnels that may traverse an IPv4 Network Address Translator
   (NAT), the ITE instead maintains SEAL-ID as a 16-bit value that it
   randomly-initializes when the soft state is created and monotonically
   increments (modulo 2^16) for each successive SEAL-encapsulated
   packet.  For each packet, the ITE writes SEAL-ID in the ID extension
   field of the SEAL header and writes a random 16-bit value in the ID
   field in the outer IPv4 header.  This requires that both the ITE and
   ETE participate in this alternate scheme.

4.2.5.  Sending SEAL packets

   Following SEAL segmentation and encapsulation, the ITE sets DF=0 in
   the outer IPv4 header of every outer packet it sends.


Templin                 Expires October 23, 2008               [Page 10]

Internet-Draft                    SEAL                        April 2008


   The ITE then sends each outer packet that encapsulates a segment of
   the same inner packet into the tunnel in canonical order, i.e.,
   Segment 0 first, then Segment 1, etc. and finally Segment N-1.

4.2.6.  Sending S-MSS Probes

   When S-MSS is larger than 128, the ITE sends each data packet as an
   implicit probe to detect any in-the-network IPv4 fragmentation.  The
   ITE sets CTL='10' in the SEAL header and DF=0 in the outer IPv4
   header of each SEAL-encapsulated packet, and will receive FRAGREP
   messages from the ETE if fragmentation occurs.  When S-MSS=128, the
   ITE instead sets CTL='01' in the SEAL header to avoid generating
   FRAGREPs for unavoidable in-the-network fragmentation.

   The ITE additionally sends explicit probes periodically to manage a
   window of SEAL-IDs of outstanding probes that allows the ITE to
   validate any FRAGREPs it receives.  The ITE sends explicit probes by
   setting CTL='11' in the SEAL header and DF=0 in the IPv4 header,
   where the probe can be either an ordinary data packet or a NULL
   packet created by setting the 'Next Header' field in the SEAL header
   to a value of "No Next Header".

   The ITE should also send explicit probes that are larger than S-MSS
   periodically to detect increases in the path MTU to the ETE; the ITE
   can send a large probe using either a NULL packet or an ordinary data
   packet that is padded at the end by setting the outer IPv4 length
   field to a larger value than the packet's true length.  When the ETE
   receives an explicit probe, it will return a FRAGREP message whether
   or not any in-the-network fragmentation occured.

4.2.7.  Processing Fragmentation Reports (FRAGREPs)

   When the ITE receives a potential FRAGREP message, it first verifies
   that the message was formatted correctly (see: Section 4.3.4) and
   that the SEAL-ID embedded in the encapsulated IPv4 first-fragment is
   within the current window of outstanding probes.  If the FRAGREP is
   valid, the ITE advances the probe window and sets a variable 'LEN' to
   the value in the first-fragment's IPv4 length field.  If (LEN-ENCAPS)
   is smaller than S-MSS and the first-fragment was also the final
   fragment, the ITE discards the FRAGREP.  Otherwise, it re-calculates
   S-MSS as follows:

           if (LEN-ENCAPS) is greater than S-MSS or LEN is at least 576
               set S-MSS to (LEN-ENCAPS)
           else
               set S-MSS to the maximum of S-MSS/2 and 128
           endif


Templin                 Expires October 23, 2008               [Page 11]

Internet-Draft                    SEAL                        April 2008


   Finally, if the length field of the inner IP header encapsulated
   within the first-fragment contains a value larger than (2KB-ENCAPS),
   and the length field of the first-fragment header contains a still
   larger value, the ITE discards the FRAGREP.  Otherwise, it
   encapsulated the inner IP packet portion embedded within the first-
   fragment in an ICMP PTB to send back to the original source, with the
   MTU field set to the maximum of 2KB-ENCAPS and (length of the first-
   fragment minus ENCAPS).

   (NB: The "576" in the S-MSS calculation above is the nominal minimum
   MTU for typical IPv4 links and accounts for normal-case IPv4 first
   fragments, while the "else" clause includes a "limited halving"
   factor that accounts for unusual cases in which the ETE receives a
   small IPv4 first-fragment [RFC1812].  This limited halving may
   require multiple iterations of sending probes and receiving FRAGREPs,
   but will rapidly converge to a stable value for S-MSS.)

4.2.8.  Processing ICMP PTBs

   SInce the ITE sends all SEAL-encapsulated packets with DF=0, it
   unconditionally ignores any ICMP PTBs pertaining to SEAL-encapsulated
   packets that it receives from within the tunnel.

4.3.  ETE Specification

4.3.1.  Reassembly Buffer Requirements

   ETEs MUST be capable of using IPv4-layer reassembly to reassemble
   SEAL-encapsulated outer packets of at least 2KB bytes, and MUST also
   be capable of using SEAL-layer reassembly to reassemble inner IP
   packets of (2KB-ENCAPS).

4.3.2.  IPv4-Layer Reassembly

   The ETE performs IPv4 reassembly as-normal, and should maintain a
   conservative high- and low-water mark for the number of outstanding
   reassemblies pending for each ITE.  When the size of the reassembly
   buffer exceeds this high-water mark, the ETE actively discards
   incomplete reassemblies (e.g., using an Active Queue Management (AQM)
   strategy) until the size falls below the low-water mark.

   After reassembly, the ETE either accepts or discards the reassembled
   packet based on the current status of the IPv4 reassembly cache
   (congested vs uncongested).  The SEAL-ID included in the IPv4 first-
   fragment provides an additional level of reassembly assurance, since
   it can record a distinct arrival timestamp useful for associating the
   first-fragment with its corresponding non-initial fragments.  The
   choice of accepting/discarding a reassembly may also depend on the


Templin                 Expires October 23, 2008               [Page 12]

Internet-Draft                    SEAL                        April 2008


   strength of the upper-layer integrity check if known (e.g., IPSec/ESP
   provides a strong upper-layer integrity check) and/or the corruption
   tolerance of the data (e.g., multicast streaming audio/video may be
   more corruption-tolerant than file transfer, etc.).

   For SEAL-encapsulated packets that are larger than 2KB and that
   arrive as multiple IPv4 fragments, the ETE uses the IPv4 first
   fragment to generate a FRAGREP as specified in Section 4.3.4.  The
   ETE then discards all non-initial IPv4 fragemnts and decapsulates the
   inner packet from the first fragment only.  If the entire inner
   packet is a single-segment SEAL packet that was fully-contained
   within the IPv4 first fragment (i.e., all non-initial IPv4 fragments
   contained only padding bytes), the ETE forwards the inner packet as-
   normal; otherwise it drops the packet.  This ensures that tunnel is
   consistent in its handling of large inner packets.

4.3.3.  SEAL-Layer Reassembly

   After IPv4-layer reassembly, the ETE performs SEAL-layer reassembly
   through simple in-order concatenation of the encapsulated segments
   from N consecutive SEAL-encapsulated packets from the same inner
   packet.  These packets contain Segment numbers 0 through N-1 with
   M=0/1 in final and non-final segments, respectively, and with
   consecutive SEAL-ID values encoded in the 32-bit concatenation of the
   ID Extension field in the SEAL header and the ID field in the IPv4
   header.  That is, for an N-segment inner packet, reassembly entails
   the concatenation of the SEAL-encapsulated segments with (Segment 0,
   SEAL-ID i), followed by (Segment 1, SEAL-ID ((i + 1) mod 2^32)), etc.
   up to (Segment N-1, SEAL-ID ((i + N-1) mod 2^32)).  (For tunnels that
   may traverse an IPv4 Network Address Translator (NAT), the ETE
   instead uses only the 16-bit value in the ID extension field in the
   SEAL header as a 16-bit SEAL-ID value, and uses mod 2^16 arithmetic
   to associate the segments of the same packet.)

   SEAL-layer reassembly requires the ETE to maintain a cache of
   recently received SEAL packets for a hold time that would allow for
   reasonable inter-segment delays.  The ETE uses a SEAL maximum segment
   lifetime of 15 seconds for this purpose, i.e., the time after which
   it will discard an incomplete reassembly.  However, the ETE should
   also actively discard any pending reassemblies that clearly have no
   opportunity for completion, e.g., when a considerable number of new
   SEAL packets have been received before a packet that completes a
   pending reassembly has arrived.

4.3.4.  Generating Fragmentation Reports (FRAGREPs)

   When the ETE receives the IPv4 first-fragment of a SEAL packet that
   was delivered as multiple IPv4 fragments and with CTL='10' in the


Templin                 Expires October 23, 2008               [Page 13]

Internet-Draft                    SEAL                        April 2008


   SEAL header, it sends a FRAGREP message back to the ITE.  The ETE
   also sends a FRAGREP for any SEAL packet with CTL='11', i.e., even if
   the packet was not fragmented and while treating the unfragmented
   packet the same as a first-fragment.

   The ETE prepares the FRAGREP message by encapsulating the leading 256
   bytes (or up to the end) of the first-fragment in outer SEAL/*/IPv4
   headers as shown in Figure 3:

   +-------------------------+ -
   |                         |   \
   ~   Outer */IPv4 headers  ~   |
   ~        of FRAGREP       ~    > FRAGREP headers
   |                         |   |
   +-------------------------+   |
   |  SEAL Header of FRAGREP |   /
   +-------------------------+ -
   |                         |   \
   ~    IP/*/SEAL/*/IPv4     ~   |
   ~  hdrs of first-fragment ~   |
   |                         |    > First 256 bytes (or up to
   +-------------------------+   |  the end) of first-fragment
   |                         |   |
   ~  Data of first-fragment ~   |
   |                         |   /
   +-------------------------+ -

             Figure 3: Fragmentation Report (FRAGREP) Message

   The ETE next sets CTL='00', Segment=0 and M=0 in the outer SEAL
   header, sets the SEAL-ID the same as for any SEAL packet, then sets
   the SEAL Next Header field and the fields of the outer */IPv4 headers
   according to the specific encapsulation type.  The ETE then sets the
   FRAGREP's destination address to the source address of the first-
   fragment and sets the FRAGREP's source address to the destination
   address of the first-fragment.  If the destination address in the
   first-fragment was multicast, the ETE instead sets the FRAGREP's
   source address to an address assigned to the underlying IPv4
   interface.  Finally, the ETE sends the FRAGREP to the ITE.


5.  Link Requirements

   Subnetwork designers are strongly encouraged to follow the
   recommendations in [RFC3819] when configuring link MTUs, where all
   IPv4 links SHOULD configure a minimum MTU of 576 bytes.  Links that
   cannot configure an MTU of at least 576 bytes (e.g., due to
   performance characteristics) SHOULD implement transparent link-layer


Templin                 Expires October 23, 2008               [Page 14]

Internet-Draft                    SEAL                        April 2008


   segmentation and reassembly such that an MTU of at least 576 can
   still be presented to the IP layer.


6.  End System Requirements

   SEAL provides robust mechanisms for returning ICMP PTB messages to
   the original source, however end systems that send unfragmentable IP
   packets larger than 1500 bytes are strongly encouraged to use
   Packetization Layer Path MTU Discovery per [RFC4821].


7.  Router Requirements

   IPv4 routers within the subnetwork observe the requirements in
   [RFC1812], and are strongly encouraged to implement IPv4
   fragmentation such that the first fragment is the largest and
   approximately the size of the underlying link MTU.


8.  IANA Considerations

   SEAL will use the IANA-assigned value of 253 as an IP protocol value
   for experimentation purposes [RFC3692]; therefore, this document has
   no actions for IANA.


9.  Security Considerations

   Unlike IPv4 fragmentation, overlapping fragment attacks are not
   possible due to the requirement that SEAL segments be non-
   overlapping.

   An amplification/reflection attack is possible when an attacker sends
   IPv4 first-fragments with spoofed source addresses to an ETE,
   resulting in a stream of FRAGREP messages returned to a victim ITE.
   The encapsulated segment of the spoofed IPv4 first-fragment provides
   mitigation for the ITE to detect and discard spurious FRAGREPs.

   The SEAL header is sent in-the-clear (outside of any IPsec/ESP
   encapsulations) the same as for the IPv4 header.  As for IPv6
   extension headers, the SEAL header is protected only by L2 integrity
   checks and is not covered under any L3 integrity checks.


10.  Acknowledgments

   Path MTU determination through the report of fragmentation


Templin                 Expires October 23, 2008               [Page 15]

Internet-Draft                    SEAL                        April 2008


   experienced by the final destination was first proposed by Charles
   Lynn of BBN on the TCP-IP mailing list in May 1987.  An historical
   analysis of the evolution of path MTU discovery appears in
   http://www.tools.ietf.org/html/draft-templin-v6v4-ndisc-01 and is
   reproduced in Appendix A of this document.

   The following individuals are acknowledged for helpful comments and
   suggestions: Jari Arkko, Fred Baker, Teco Boot, Iljitsch van Beijnum,
   Brian Carpenter, Steve Casner, Ian Chakeres, Remi Denis-Courmont,
   Aurnaud Ebalard, Gorry Fairhurst, Joel Halpern, John Heffner, Bob
   Hinden, Christian Huitema, Joe Macker, Matt Mathis, Dan Romascanu,
   Dave Thaler, Joe Touch, Magnus Westerlund, Robin Whittle, James
   Woodyatt and members of the Boeing PhantomWorks DC&NT group.


11.  References

11.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC1812]  Baker, F., "Requirements for IP Version 4 Routers",
              RFC 1812, June 1995.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

11.2.  Informative References

   [FOLK]     C, C., D, D., and k. k, "Beyond Folklore: Observations on
              Fragmented Traffic", December 2002.

   [FRAG]     Kent, C. and J. Mogul, "Fragmentation Considered Harmful",
              October 1987.

   [I-D.ietf-manet-smf]
              Macker, J. and S. Team, "Simplified Multicast Forwarding
              for MANET", draft-ietf-manet-smf-07 (work in progress),
              February 2008.

   [I-D.templin-autoconf-dhcp]
              Templin, F., Russert, S., and S. Yi, "The MANET Virtual
              Ethernet (VET) Abstraction",
              draft-templin-autoconf-dhcp-14 (work in progress),


Templin                 Expires October 23, 2008               [Page 16]

Internet-Draft                    SEAL                        April 2008


              April 2008.

   [MTUDWG]   "IETF MTU Discovery Working Group mailing list,
              gatekeeper.dec.com/pub/DEC/WRL/mogul/mtudwg-log, November
              1989 - February 1995.".

   [RFC1063]  Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP
              MTU discovery options", RFC 1063, July 1988.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
              October 1996.

   [RFC2004]  Perkins, C., "Minimal Encapsulation within IP", RFC 2004,
              October 1996.

   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
              RFC 2923, September 2000.

   [RFC3692]  Narten, T., "Assigning Experimental and Testing Numbers
              Considered Useful", BCP 82, RFC 3692, January 2004.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
              for IPv6 Hosts and Routers", RFC 4213, October 2005.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

   [RFC4380]  Huitema, C., "Teredo: Tunneling IPv6 over UDP through
              Network Address Translations (NATs)", RFC 4380,
              February 2006.

   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
              Network Tunneling", RFC 4459, April 2006.

   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", RFC 4821, March 2007.


Templin                 Expires October 23, 2008               [Page 17]

Internet-Draft                    SEAL                        April 2008


   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
              Errors at High Data Rates", RFC 4963, July 2007.

   [TCP-IP]   "TCP-IP mailing list archives,
              http://www-mice.cs.ucl.ac.uk/multimedia/mist/tcpip, May
              1987 - May 1990.".


Appendix A.  Historic Evolution of PMTUD (written 10/30/2002)

   The topic of Path MTU discovery (PMTUD) saw a flurry of discussion
   and numerous proposals in the late 1980's through early 1990.  The
   initial problem was posed by Art Berggreen on May 22, 1987 in a
   message to the TCP-IP discussion group [TCP-IP].  The discussion that
   followed provided significant reference material for [FRAG].  An IETF
   Path MTU Discovery Working Group [MTUDWG] was formed in late 1989
   with charter to produce an RFC.  Several variations on a very few
   basic proposals were entertained, including:

   1.  Routers record the PMTUD estimate in ICMP-like path probe
       messages (proposed in [FRAG] and later [RFC1063])

   2.  The destination reports any fragmentation that occurs for packets
       received with the "RF" (Report Fragmentation) bit set (Steve
       Deering's 1989 adaptation of Charles Lynn's Nov. 1987 proposal)

   3.  A hybrid combination of 1) and Charles Lynn's Nov. 1987 proposal
       (straw RFC draft by McCloughrie, Fox and Mogul on Jan 12, 1990)

   4.  Combination of the Lynn proposal with TCP (Fred Bohle, Jan 30,
       1990)

   5.  Fragmentation avoidance by setting "IP_DF" flag on all packets
       and retransmitting if ICMPv4 "fragmentation needed" messages
       occur (Geof Cooper's 1987 proposal; later adapted into [RFC1191]
       by Mogul and Deering).

   Option 1) seemed attractive to the group at the time, since it was
   believed that routers would migrate more quickly than hosts.  Option
   2) was a strong contender, but repeated attempts to secure an "RF"
   bit in the IPv4 header from the IESG failed and the proponents became
   discouraged. 3) was abandoned because it was perceived as too
   complicated, and 4) never received any apparent serious
   consideration.  Proposal 5) was a late entry into the discussion from
   Steve Deering on Feb. 24th, 1990.  The discussion group soon
   thereafter seemingly lost track of all other proposals and adopted
   5), which eventually evolved into [RFC1191] and later [RFC1981].


Templin                 Expires October 23, 2008               [Page 18]

Internet-Draft                    SEAL                        April 2008


   In retrospect, the "RF" bit postulated in 2) is not needed if a
   "contract" is first established between the peers, as in proposal 4)
   and a message to the MTUDWG mailing list from jrd@PTT.LCS.MIT.EDU on
   Feb 19. 1990.  These proposals saw little discussion or rebuttal, and
   were dismissed based on the following the assertions:

   o  routers upgrade their software faster than hosts

   o  PCs could not reassemble fragmented packets

   o  Proteon and Wellfleet routers did not reproduce the "RF" bit
      properly in fragmented packets

   o  Ethernet-FDDI bridges would need to perform fragmentation (i.e.,
      "translucent" not "transparent" bridging)

   o  the 16-bit IP_ID field could wrap around and disrupt reassembly at
      high packet arrival rates

   The first four assertions, although perhaps valid at the time, have
   been overcome by historical events leaving only the final to
   consider.  But, [FOLK] has shown that IP_ID wraparound simply does
   not occur within several orders of magnitude the reassembly timeout
   window on high-bandwidth networks.

   (Authors 2/11/08 note: this final point was based on a loose
   interpretation of [FOLK], and is more accurately addressed in
   [RFC4963].)


Author's Address

   Fred L. Templin (editor)
   Boeing Phantom Works
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fltemplin@acm.org


Templin                 Expires October 23, 2008               [Page 19]

Internet-Draft                    SEAL                        April 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Templin                 Expires October 23, 2008               [Page 20]