Network Working Group F. Templin, Ed. Internet-Draft Boeing Phantom Works Intended status: Informational September 25, 2007 Expires: March 28, 2008 Packetization Layer Path MTU Discovery for IP/*/IPv4 Tunnels draft-templin-inetmtu-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on March 28, 2008. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract The nominal Maximum Transmission Unit (MTU) of the Internet has become 1500 bytes, but existing IP/*/IPv4 tunneling mechanisms impose an encapsulation overhead that can reduce the effective path MTU to smaller values. Additionally, existing IP/*/IPv4 tunneling mechanisms are limited in their ability to discover and utilize larger MTUs. This document specifies new mechanisms for conveying packets over IP/*/IPv4 tunnels that address these issues. Templin Expires March 28, 2008 [Page 1] Internet-Draft PLPMTUD for Tunnels September 2007 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Concept of Operation . . . . . . . . . . . . . . . . . . . . . 4 4. Tunnel MTU and MRU . . . . . . . . . . . . . . . . . . . . . . 4 5. Tunnel Soft State . . . . . . . . . . . . . . . . . . . . . . 5 6. Sending Packets . . . . . . . . . . . . . . . . . . . . . . . 5 6.1. Conceptual Sending Algorithm . . . . . . . . . . . . . . . 6 6.2. Inner packet Fragmentation . . . . . . . . . . . . . . . . 7 6.3. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 6.3.1. Footer . . . . . . . . . . . . . . . . . . . . . . . . 7 6.3.2. Trailing Data and Checksum . . . . . . . . . . . . . . 8 6.3.3. Data, Probe Request, and Probe Solicitation Format . . 9 6.3.4. Probe Reply Format . . . . . . . . . . . . . . . . . . 9 6.4. Outer Packet Fragmentation . . . . . . . . . . . . . . . . 11 6.5. Setting DF in the Outer Header . . . . . . . . . . . . . . 11 6.6. Window Management . . . . . . . . . . . . . . . . . . . . 11 7. Receiving Packets . . . . . . . . . . . . . . . . . . . . . . 11 7.1. Decapsulation . . . . . . . . . . . . . . . . . . . . . . 11 7.2. Receiving Packet Too Big (PTB) Errors . . . . . . . . . . 12 8. Tunnel Qualification and Soft State Management . . . . . . . . 12 8.1. Probe Requests . . . . . . . . . . . . . . . . . . . . . . 12 8.1.1. Sending Probe Requests . . . . . . . . . . . . . . . . 12 8.1.2. Receiving Probe Requests . . . . . . . . . . . . . . . 13 8.2. Probe Solicitations . . . . . . . . . . . . . . . . . . . 13 8.2.1. Sending Probe Solicitations . . . . . . . . . . . . . 13 8.2.2. Receiving Probe Solicitations . . . . . . . . . . . . 14 8.3. Probe Replies . . . . . . . . . . . . . . . . . . . . . . 14 8.3.1. Sending Probe Replies . . . . . . . . . . . . . . . . 14 8.3.2. Receiving Probe Replies . . . . . . . . . . . . . . . 15 9. 8-bit Fletcher Checksum Calculation . . . . . . . . . . . . . 16 10. Updated Specifications . . . . . . . . . . . . . . . . . . . . 16 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 12. Security Considerations . . . . . . . . . . . . . . . . . . . 17 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 14.1. Normative References . . . . . . . . . . . . . . . . . . . 18 14.2. Informative References . . . . . . . . . . . . . . . . . . 18 Appendix A. Discussion . . . . . . . . . . . . . . . . . . . . . 19 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20 Intellectual Property and Copyright Statements . . . . . . . . . . 21 Templin Expires March 28, 2008 [Page 2] Internet-Draft PLPMTUD for Tunnels September 2007 1. Introduction The nominal Maximum Transmission Unit (MTU) of today's Internet has become 1500 bytes due to the preponderance of networking gear that configures an MTU of that size. Since not all links in the Internet configure a 1500 byte MTU, however [RFC3819], packets can be dropped due to an MTU restriction on the path. Internet Protocol, Version 4 (IPv4) [RFC0791] is the predominant network layer protocol in the Internet today, and it is likely that IPv4 use will continue to grow into the future. It is therefore essential that tunnels over IPv4 (hereafter called IP/*/IPv4 tunnels) be made capable of consistent and efficient handling of packets of various sizes. Upper layers see IP/*/IPv4 tunnels as ordinary links, but even for packets no larger than 1500 bytes these links are susceptible to silent loss (e.g., due to path MTU restrictions, lost error messages, layered encapsulations, reassembly buffer limitations, etc.) resulting in poor performance and/or communications failures [RFC2923][RFC4459][RFC4821][RFC4963]. This document specifies new mechanisms for IP/*/IPv4 tunnels that assure robust handling for packets of various sizes; it updates the functional specifications for Tunnel Endpoints (TEs) found in existing IP/*/IPv4 tunneling mechanisms (see: Section 10). 2. Terminology The following abbreviations and terms are used in this document: DF - the IPv4 header "Don't Fragment" flag ([RFC0791], Section 3.1). EMTU_R - Effective MTU to Receive ([RFC1122], Section 3.3.2). ENCAPS - the size of the encapsulating */IPv4 headers plus trailers. IPv4 - Internet Protocol, Version 4 IPv6 - Internet Protocol, Version 6 MaxOuterPktLen - Maximum Outer Packet Length, in bytes MaxInnerPktLen - Maximum Inner Packet Length, in bytes ReassTime - Reassembly Timeout Templin Expires March 28, 2008 [Page 3] Internet-Draft PLPMTUD for Tunnels September 2007 MRU - Maximum Receive Unit. For the purpose of this document, 'MRU' has exactly the same meaning as 'EMTU_R' MTU - Maximum Transmission Unit PTB - Packet Too Big error TE - Tunnel Endpoint TFE - Tunnel Far End TNE - Tunnel Near End IP/*/IPv4 - an IP packet encapsulated in */IPv4 headers (e.g. for "*" = NULL, UDP, TCP, AH, ESP, etc.). inner packet/header/payload - an IP packet/header/payload before IP/*/IPv4 encapsulation. outer packet/header/payload - a */IPv4 packet/header/payload after IP/*/IPv4 encapsulation. 3. Concept of Operation TEs that implement this scheme engage in a continuous handshaking process while data is flowing through the tunnel to confirm that the TFE is participating and to maintain soft state used for determining maximum packet sizes. When the flow of data through the tunnel is suspended, the handshaking process is discontinued. When one or both of the TEs do not implement the scheme, the behavior automatically reverts to that of the legacy IP/*/IPv4 tunneling mechanism. 4. Tunnel MTU and MRU TEs configure an indefinite MTU on the tunnel interface, i.e., there is no logical limit on the size of inner packets that upper layers can present to the tunnel interface. TEs MUST configure an MRU (i.e., an EMTU_R) that is no smaller than 2048 bytes (2KB) on all IPv4 interfaces over which a tunnel interface is configured. Additionally, they MUST configure an MRU that is no smaller than 2KB on the tunnel interface, and SHOULD configure an MRU that is no smaller than the largest MRU of any IPv4 interfaces over which the tunnel is configured. Templin Expires March 28, 2008 [Page 4] Internet-Draft PLPMTUD for Tunnels September 2007 5. Tunnel Soft State TEs maintain the following per-TFE conceptual variables as soft state (e.g., in a conceptual neighbor cache): MaxOuterPktLen the current maximum length outer packet/fragment that can be accommodated by the IPv4 path MTU without further fragmentation. Recommended default value: 128 bytes. Range: 68 bytes to 64KB. MaxInnerPktLen the current maximum length inner packet/fragment that the TFE can reassemble over the tunnel, i.e., the MRU. Recommended default value: the minimum MRU defined for the specific IP/*/IPv4 tunneling mechanism (e.g., 1500 bytes for [RFC4213]). Range: 576 bytes to (2^32-1) bytes. ReassTime the current timeout value that the TFE uses for reassembly of fragmented packets that traverse the tunnel. Recommended default value: 120 seconds. Range: 4uSec to 4*(2^32)usec (~4.55hr). IPv4Id the current IPv4 ID value that the TE will assign in the outer IPv4 header of packets it sends into the tunnel. Initial value: randomly chosen. Range: 0 to 2^16-1. isQualified boolean indicating whether the TFE implements the scheme. Recommended default value: FALSE. isNAT boolean indicating whether there is an IPv4 Network Address Translator (NAT) on the path to the TFE. Default value: TRUE or FALSE, based on the specific IP/*/IPv4 tunneling mechanism. See: [RFC3819], Section 2 for subnetwork MTU recommendations that influence 'MaxOuterPktLen'. See: [RFC1122], Section 3.3.2 for EMTU_R (MRU) and reassembly timeout recommendations. 6. Sending Packets TEs send packets across a tunnel to the TFE according to the following specifications: Templin Expires March 28, 2008 [Page 5] Internet-Draft PLPMTUD for Tunnels September 2007 6.1. Conceptual Sending Algorithm With reference to Sections 6.2 - 6.6, TEs use the following conceptual sending algorithm: if inner packet is larger than 'MaxInnerPktLen' and inner packet is not fragmentable (see: Section 6.2) Send PTB appropriate to the inner protocol (e.g., an ICMPv6 PTB [RFC1981]) with MTU = 'MaxInnerPktLen'. Drop packet. else if 'isNAT' and inner packet is not a probe used for 'MaxOuterPktLen' determination if inner packet is larger than 2*('MaxOuterPktLen' - ENCAPS) and inner packet is not fragmentable (see: Section 6.2) Send PTB appropriate to the inner protocol with MTU = 2*('MaxOuterPktLen' - ENCAPS)). Drop packet. else Fragment inner packet into fragments no larger than MIN('MaxInnerPktLen', 2*('MaxOuterPktLen' - ENCAPS)) (see: Section 6.2). endif else Fragment inner packet into fragments no larger than 'MaxInnerPktLen' (see: Section 6.2). endif foreach inner packet/fragment Encapsulate as an outer IPv4 packet (see: Section 6.3). if outer packet is not a probe used for 'MaxOuterPktLen' determination fragment outer packet into fragments no larger than 'MaxOuterPktLen' (see: Section 6.4). endif foreach outer packet/fragment Set DF in the outer header according to Section 6.5. Send fragment subject to window restrictions (see: Section 6.6). endforeach endforeach endif Figure 1: Conceptual Sending Algorithm Templin Expires March 28, 2008 [Page 6] Internet-Draft PLPMTUD for Tunnels September 2007 6.2. Inner packet Fragmentation An inner packet is fragmentable IFF the TE is permitted to break it into inner fragments before encapsulation, e.g., an IPv6 packet with a fragment header, an IPv4 packet with DF=0, etc. TEs break fragmentable inner packets into inner fragments of no more than 'MaxInnerPktLen' bytes when 'isNAT' is FALSE and no more than MIN('MaxInnerPktLen', 2*('MaxOuterPktLen' - ENCAPS)) bytes when 'isNAT' is TRUE. The TE then encapsulates each inner fragment per Section 6.3. These inner fragments will be reassembled by the final destination. When 'isNAT' is TRUE, 2*('MaxOuterPktLen' - ENCAPS) may not be large enough to accommodate the minimum IPv6 MTU such that the TE may be required to drop an IPv6 packet of 1280 bytes or smaller and send an ICMPv6 PTB with an MTU value less than 1280 bytes. The original IPv6 source will then include a fragment header in subsequent IPv6 packets and the TE can then perform IPv6 fragmentation on these inner packets using the fragment header included by the source according to the final paragraph of [RFC2460], Section 5. 6.3. Encapsulation TEs encapsulate inner IP packets according to the specific IP/*/IPv4 document, except that the TE maintains a randomly-initialized and monotonically-increasing (modulo 64K) per-TFE 'IPv4Id' value that it encodes in the outer IPv4 headers of successive encapsulated packets. The TE also appends trailing data as specified in the following sections and increments the innermost '*' header length field by the number of trailing data bytes added, e.g., the UDP length field for IPv6/UDP/IPv4 tunnels, the IPv4 length field for IPv6/IPv4 tunnels, etc. 6.3.1. Footer When trailing data is included (see Section 6.3.2), the TE adds the following 4-byte footer as the final 4 bytes of the trailing data. The footer is byte-aligned only, and need not be aligned on an even word/longword/etc. boundary: Templin Expires March 28, 2008 [Page 7] Internet-Draft PLPMTUD for Tunnels September 2007 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Type | Reserved | Fletcher A | Fletcher B | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: Footer Format where the fields of the footer are specified as follows: Version (4 bits) The Version field indicates the format of the trailing data. This document describes version 1. Type (4 bits) The type of encapsulated packet. The following types are defined: 0 - Ordinary data packet. 1 - Probe Request (see: Section 8.1). 2 - Probe Solicitation (see: Section 8.2). 3 - Probe Reply (see: Section 8.3). 4 - 15 - Reserved for future use. Reserved (8 bits) Reserved for future use. Fletcher A (8 bits) The 8-bit Fletcher A checksum component. Fletcher B (8 bits) The 8-bit Fletcher B checksum component. 6.3.2. Trailing Data and Checksum The TE MUST include trailing data with a non-zero checksum in the footer of all probe request/reply/solicit packets, and MUST include trailing data with a non-zero checksum in the footer of data packets when 'isNAT' is TRUE. The TE MAY include trailing data with either a zero or non-zero checksum in data packets when 'isNAT' is FALSE, or MAY alternately omit trailing data in those packets. For probe reply packets, the TE appends zero-filled padding bytes as necessary to extend the packet to a minimum of 50 bytes beyond the beginning of the inner IP header then appends a 14 byte control block Templin Expires March 28, 2008 [Page 8] Internet-Draft PLPMTUD for Tunnels September 2007 as specified in Section 6.3.4. For all other packets that will include a non-zero trailing checksum, the TE appends zero-filled padding bytes as necessary to extend the packet to a minimum of 64 bytes beyond the beginning of the inner IP header. The TE then calculates the 8-bit Fletcher checksum as specified in Section 9 and encodes the results in the Fletcher A and B fields of the footer. The footer is appended as the final 4 bytes of the trailing data, as specified in the following sections. 6.3.3. Data, Probe Request, and Probe Solicitation Format The TE uses the following packet format for data, probe request, and probe solicitation packets (types 0 through 2): +---------------------------------+ | Outer IPv4 | | Header w/'IPv4Id' | +---------------------------------+ | * Headers | | | +-------------+ +---------------------------------+ | Inner IP | | Inner IP | ~ packet ~ ===> ~ packet ~ | | | | T +-------------+ +---------------------------------+ -\ r Inner Packet | | | a ~ Zero Padding ~ | i | | > l +---------------------------------+ | e | Footer (see: Section 6.3.1) | | r +---------------------------------+ -/ s | Any */IPv4 protocol trailers ... +------------------------------ Outer Packet with Trailers Figure 3: Data, Probe Request, and Probe Solicitation Format 6.3.4. Probe Reply Format The TE uses the following encapsulation format for all probe reply packets (type 3): Templin Expires March 28, 2008 [Page 9] Internet-Draft PLPMTUD for Tunnels September 2007 +--------------------------------+ | Outer IPv4 | | Header w/'IPv4Id' | +--------------------------------+ | * Headers | | | +-------------+ +--------------------------------+ | inner IP | | inner IP | ~ echo ~ ===> ~ echo ~ | reply | | reply | +-------------+ +--------------------------------+ -\ Inner Reply | | | ~ Zero Padding ~ | | | | +--------------------------------+ | T | YourPort / YourId | | r +--------------------------------+ | a | YourAddr | > i +--------------------------------+ | l | ReassTime | | e +--------------------------------+ | r | MaxInnerPktLen | | s +--------------------------------+ | | Footer (see: Section 6.3.1) | | +--------------------------------+ -/ | Any */IPv4 protocol trailers ... +------------------------------ Outer Reply with Trailers Figure 4: Probe Reply Format where the following 14-byte "control block" information is included immediately following the padding and immediately before the trailing footer: YourPort (16 bits) - 1's complement of the observed port number of the probe request. YourId (16 bits) - 1's complement of the observed ip_id number of the probe request. YourAddr (32 bits) - 1's complement of the observed IPv4 source address of the probe request. ReassTime (32 bits) - non-zero value between 1 - (2^32-1) in 4usec increments. Templin Expires March 28, 2008 [Page 10] Internet-Draft PLPMTUD for Tunnels September 2007 MaxInnerPktLen (32 bits) - non-zero value between 576 - (2^32-1) in 1 byte increments. 6.4. Outer Packet Fragmentation For packets other than probe requests used for 'MaxOuterPktLen' determination, TEs use IPv4 fragmentation to fragment outer packets after IPv4 encapsulation into fragments no larger than 'MaxOuterPktLen' bytes. These outer fragments will be reassembled by the TFE. 6.5. Setting DF in the Outer Header TEs MUST set DF=1 in the outer IPv4 header of probe requests to be used for 'MaxOuterPktLen' determination. TEs MAY set DF=0 in the outer header of other probe requests and SHOULD set DF=0 in the outer header of probe replies. TEs MUST set DF=1 in the outer header of ordinary data packets/ fragments when 'isNAT' is TRUE. TEs MAY set DF=0 in the outer header of ordinary data packets/ fragments when 'isNAT' is FALSE. 6.6. Window Management TEs send packets into a tunnel according to a window based on the TFE's advertised 'ReassTime'. In particular, the TE must not admit more than 2^16 packets into the tunnel within the 'ReassTime' window. TE implementations should use discretion when not all of the inner- and outer fragments of the original packet could be admitted into the tunnel within the current window, i.e., implementations are advised to determine when it is appropriate to admit some fragments vs. drop all fragments. 7. Receiving Packets 7.1. Decapsulation TEs decapsulate each outer packet they receive exactly as specified in the appropriate IP/*/IPv4 document except that when 'isQualified' is TRUE and the packet includes a non-zero trailing checksum the TE first verifies the checksum in the outer packet as specified in Section 9. If the A and B results of the checksum calculation match the values stored in the trailing checksum, the TE decapsulates the packet; otherwise it drops the packet. Templin Expires March 28, 2008 [Page 11] Internet-Draft PLPMTUD for Tunnels September 2007 Note that the initial probe request/reply packets from a new TFE will be received before 'isQualified' is set to TRUE. The TE decapsulates these packets also as specified in Section 8. 7.2. Receiving Packet Too Big (PTB) Errors TEs may receive ICMPv4 PTB errors with Type=3 ("Destination Unreachable") and Code=4 ("fragmentation needed, and DF set") that include a Next-Hop MTU value [RFC1191] in response to any packets that were admitted into the tunnel with DF=1 [RFC0792]. When a TE receives an ICMPv4 PTB with a Next-Hop MTU value smaller than 'MaxOuterPktLen', it SHOULD reduce 'MaxOuterPktLen' and/or actively probe to discover and confirm a new 'MaxOuterPktLen'. The TE SHOULD NOT send a translated PTB back to the inner source. 8. Tunnel Qualification and Soft State Management TEs engage in a probing process to qualify new TFEs and refresh per- TFE soft state for qualified TFEs thereafter. TEs discontinue the probing process and garbage-collect stale soft state for dormant tunnels and unqualified TFEs. TEs exchange probe requests, probe solicitations and probe replies as specified in the following sections: 8.1. Probe Requests TEs send and receive probe requests as specified below: 8.1.1. Sending Probe Requests 8.1.1.1. Basic Probing Strategy TEs send probe requests while data is actively flowing through the tunnel. The TE sends initial probe requests to qualify each new TFE, then sends periodic probe requests thereafter. The TE SHOULD limit the rate at which it sends probe requests to each TFE, but MUST probe frequently enough to refresh per-TFE conceptual variables. The TE retains a cache of recently-sent probe requests and uses them to verify subsequent probe replies. 8.1.1.2. MaxOuterPktLen Probing The TE SHOULD probe to detect larger 'MaxOuterPktLen' values by sending progressively larger probe requests padded to the desired probe size. When the TE receives sufficient evidence through probing Templin Expires March 28, 2008 [Page 12] Internet-Draft PLPMTUD for Tunnels September 2007 that the forward path to the TFE supports the probed size, it advances 'MaxOuterPktLen' to the probe size. The TE SHOULD NOT send probe requests larger than ('MaxInnerPktLen' + ENCAPS). The TE MAY send a series of probes in parallel to mitigate 'MaxOuterPktLen' fluctuations in the case of multipath routes with diverse path MTUs. 8.1.1.3. Generating and Sending Probe Requests TEs generate probe requests by creating a minimum-sized and unfragmentable IP echo request packet according to the inner IP protocol (e.g., an ICMPv6 echo request [RFC4443] when the inner IP protocol is IPv6). The echo request MUST include source and destination addresses that correspond to the TNE and TFE respectively, and SHOULD include additional identifying information (e.g., sequence/identification numbers, nonce values, etc.) that the TFE will echo in its reply. The TE then encapsulates the echo request with padding added to create an outer probe request of the desired probe size and sends the probe request into the tunnel as specified in Section 6. 8.1.2. Receiving Probe Requests When a TE receives a potential probe request from a TFE (i.e., as- told by examining the potential trailing footer), it first determines whether the packet includes a valid trailing checksum. If the packet did not include a valid trailing checksum, the TE discontinues probe request processing, decapsulates the packet as for ordinary data and returns from processing. Otherwise, the TE generates a probe reply as specified in Section 8.3. 8.2. Probe Solicitations TEs send and receive probe solicitations as specified below: 8.2.1. Sending Probe Solicitations When a TE has new information to convey to a TFE, but has not received recent probe requests from the TFE, it MAY send a probe solicitation to the TFE. The TE creates a NULL inner IP packet (e.g., an IPv6 header with "No Next Header" in the Next Header field) with source and destination addresses that correspond to the TNE and TFE respectively. The TE then encapsulates the NULL packet as a probe solicitation and sends it into the tunnel as specified in Section 6. Templin Expires March 28, 2008 [Page 13] Internet-Draft PLPMTUD for Tunnels September 2007 8.2.2. Receiving Probe Solicitations When a TE receives a potential probe solicitation from a TFE, it first determines whether the packet includes a valid trailing checksum. If the packet did not include a valid trailing checksum, the TE discontinues probe solicitation processing, decapsulates the inner packet as for ordinary data and returns from processing. Otherwise, the TE SHOULD send an expedited probe request with DF=0 to the TFE as specified in Section 8.1 if it has not successfully probed the TFE recently. The TE then discards the probe solicitation. 8.3. Probe Replies TEs send and receive probe replies as specified below: 8.3.1. Sending Probe Replies TEs send probe replies in response to valid probe requests and use them as a mechanism for advertising 'MaxInnerPktLen' and 'ReassTime' values to the TFE. TEs also use probe replies to inform the TFE of the IPv4 address and protocol port number that it observed in the TFE's probe request. The TE creates an inner IP echo reply packet according to the inner IP protocol (e.g., an ICMPv6 echo reply [RFC4443] when the inner protocol is IPv6). The TE includes in the echo reply the destination address of the echo request as the source address and the source address of the echo request as the destination addresses. The TE also includes in the echo reply any additional identifying information that the TFE included in its echo request. The TE then encapsulates the echo reply as specified in Section 6.3. For IP/*/IPv4 tunneling mechanisms that include a port number in the encapsulating * header, the TE includes the 1's complement of the protocol source port number it observed in the TFE's probe request (e.g., the UDP source port number for IPv6/UDP/IPv4 encapsulation) in the 16-bit 'YourPort' field. (Otherwise, the TE encodes the value '0' in the 'YourPort' field.) The TE next includes the 1's complement of the ip_id it observed in the outer IPv4 header of the TFE's probe request in the 16-bit 'YourId' field and encodes the source address of the probe request in the 32-bit 'YourAddr' field. The TE next includes a value that is less than or equal to an MRU appropriate for the interface the TFE's probe request arrived on in the 'MaxInnerPktLen' field. The TE MAY choose to dynamically increase or decrease the 'MaxInnerPktLen' values it advertises to a TFE in successive probe replies, but if so it SHOULD seek to converge Templin Expires March 28, 2008 [Page 14] Internet-Draft PLPMTUD for Tunnels September 2007 to a stable value. The TE finally includes a reassembly timeout value appropriate for the interface the TFE's probe request arrived on in the 'ReassTime' field. The TE MAY choose to dynamically increase or decrease the 'ReassTime' value it advertises to a TFE in successive probe replies, but if so it SHOULD seek to converge to a stable value. Following the encoding of the above trailing data, the TE appends the trailing checksum and sends the reply to the TFE. 8.3.2. Receiving Probe Replies 8.3.2.1. Probe Reply Verification When a TE receives a potential probe reply from a TFE, it first determines whether the packet includes a valid trailing checksum. The TE next verifies that the packet includes enough trailing data to contain a probe reply control block (see: Section 6.3.4) then examines the 'MaxInnerPktLen' and 'ReassTime' values in the potential control block. If the packet did not include a valid trailing checksum, or the packet did not include a control block, or if either of the 'MaxInnerPktLen' or 'ReassTime' values in the potential control block lie outside of the acceptable ranges listed in Section 6.3.4, the TE discontinues probe reply processing, decapsulates the packet as for ordinary data and returns from processing. Next, the TE verifies that the inner IP echo reply matches one of its cached probe requests by examining the inner IP source and destination addresses as well as any other identifying information in the inner packet. The TE sets: 'isQualified' to TRUE for this TFE if the probe reply is valid; otherwise, it discards the probe reply and returns from processing. If the TE receives excessive invalid probe replies from a TFE, it resets 'isQualified' to FALSE and restores 'MaxOuterPktLen' and 'MaxInnerPktLen' to default values. 8.3.2.2. Probe Reply Processing For IP/*/IPv4 tunneling mechanisms that include port numbers in encapsulating * headers, the TE next examines the 'YourPort', 'YourId' and 'YourAddr' values encoded in the packet. If the values match the 1's complement of the probe request's protocol port, ip_id and IPv4 address, respectively, the TE sets 'isNAT' to FALSE; otherwise, it sets 'isNAT' to TRUE. (For encapsulating * headers that do not include port numbers, the TE the ignores the 'YourPort' value in this check.) Next, the TE records the 'MaxInnerPktLen' and 'ReassTime' values in Templin Expires March 28, 2008 [Page 15] Internet-Draft PLPMTUD for Tunnels September 2007 the corresponding conceptual variables for this TFE. If the new 'MaxInnerPktLen' is smaller than ('MaxOuterPktLen' - ENCAPS), the TE SHOULD reduce 'MaxOuterPktLen' to ('MaxInnerPktLen' + ENCAPS). If the 'MaxInnerPktLen' and 'ReassTime' values fluctuate significantly between successive probe replies, the TE SHOULD record the most conservative values received (e.g., 16KB 'MaxInnerPktLen' instead of 64KB, 90sec 'ReassTime' instead of 60sec, etc.). Following the above processing, the TE discards the probe reply. 9. 8-bit Fletcher Checksum Calculation The 8-bit Fletcher Checksum is discussed in [RFC1146][STONE1][STONE2] and is used by this specification to provide an integrity check with different properties than those used by common link layers and upper layer protocols. The TE calculates the 8-bit Fletcher checksum of the first 64 bytes of the inner packet beginning with the inner IP header according to the algorithm of [RFC1146], which is reproduced below with an additional rule for representing zero results: The 8-bit Fletcher Checksum Algorithm is calculated over a sequence of data octets (call them D[1] through D[N]) by maintaining 2 unsigned 1's-complement 8-bit accumulators A and B whose contents are initially zero, and performing the following loop where i ranges from 1 to N: A := A + D[i] B := B + A If, at the end of the loop, either or both of the A, B accumulators encode the value 0x0000, invert the value in the accumulator(s) to 0xffff. Note that faster algorithms are possible and may be used instead of the algorithm above; see: [RFC1146] for citations of alternate algorithms. 10. Updated Specifications This document updates the following specifications: o RFC2003 (IP-in-IP) Templin Expires March 28, 2008 [Page 16] Internet-Draft PLPMTUD for Tunnels September 2007 o RFC2529 (6over4) o RFC2661 (L2TP) o RFC2784 (GRE) o RFC3056 (6to4) o RFC3378 (ETHERIP) o RFC3884 (IPSec Transport Mode for Dynamic Routing) o RFC4023 (MPLS-in-IP) o RFC4213 (Basic IPv6 Transition Mechanisms) o RFC4214 (ISATAP) o RFC4301 (IPSec) o RFC4302 (AH) o RFC4303 (ESP) o RFC4380 (TEREDO) o LISP o others.... 11. IANA Considerations The IANA is instructed to create a registry for the Version and Type values that occur in the footers of encapsulated packets per Section 6.3.1. 12. Security Considerations A possible attack vector involves an off-path attacker sending probe requests and/or probe solicitations with spoofed source addresses. Legitimate probe requests and replies contain identifying information that is useful for defending against off-path attacks. Security considerations for specific IP/*/IPv4 tunneling mechanisms are given in the respective documents. Templin Expires March 28, 2008 [Page 17] Internet-Draft PLPMTUD for Tunnels September 2007 13. Acknowledgments This work has benefited from discussions with Fred Baker, Iljitsch van Beijnum, Steve Casner, Gorry Fairhurst, John Heffner, Joe Macker, Matt Mathis, and Joe Touch. Dan Romascanu mentioned the IEEE 802.3as extension of the Ethernet frame size to 2048 bytes. Remi Denis- Courmont noted that trailers could be added using the innermost '*' protocol length field. 14. References 14.1. Normative References [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, September 1981. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. 14.2. Informative References [RFC0905] International Organization for Standardization (ISO), "ISO Transport Protocol specification ISO DP 8073", RFC 905, April 1984. [RFC1146] Zweig, J. and C. Partridge, "TCP alternate checksum options", RFC 1146, March 1990. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923, September 2000. [RFC3385] Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna, Templin Expires March 28, 2008 [Page 18] Internet-Draft PLPMTUD for Tunnels September 2007 "Internet Protocol Small Computer System Interface (iSCSI) Cyclic Redundancy Check (CRC)/Checksum Considerations", RFC 3385, September 2002. [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, July 2004. [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for IPv6 Hosts and Routers", RFC 4213, October 2005. [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification", RFC 4443, March 2006. [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- Network Tunneling", RFC 4459, April 2006. [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", RFC 4821, March 2007. [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly Errors at High Data Rates", RFC 4963, July 2007. [STONE1] Stone, J., "Checksums in the Internet (Stanford Doctoral Dissertation)", August 2001. [STONE2] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, "Performance of Checksums and CRC's over Real Data, IEEE/ ACM Transactions on Networking, Vol 6, No. 5", October 1998. Appendix A. Discussion Probing strategies for packetization layer protocols are specified in ([RFC4821], Section 7) and apply also to the TE's 'MaxOuterPktLen' probing process. Further strategies for handling ICMPv4 PTB errors are specified in ([RFC4821], Section 7) and apply also to the TE's 'MaxOuterPktLen' probing process. Note that decapsulation automatically erases any padding that may have been inserted by the TE along with the trailing checksum. Templin Expires March 28, 2008 [Page 19] Internet-Draft PLPMTUD for Tunnels September 2007 Author's Address Fred L. Templin (editor) Boeing Phantom Works P.O. Box 3707 Seattle, WA 98124 USA Email: fred.l.templin@boeing.com Templin Expires March 28, 2008 [Page 20] Internet-Draft PLPMTUD for Tunnels September 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Templin Expires March 28, 2008 [Page 21]