Network Working Group I. van Beijnum Internet-Draft Consultant Expires: December 29, 2007 June 29, 2007 IPv6 Extensions for Multi-MTU Subnets draft-van-beijnum-multi-mtu-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 29, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract In the early days of the internet, many different link types with many different maximum packet sizes were in use. For point-to-point or point-to-multipoint links, there are still some other link types (PPP, ATM, Packet over SONET), but shared subnets are almost exclusively implemented as ethernets. Even though the relevant standards madate a 1500 byte maximum packet size for ethernet, more and more ethernet equipment is capable of handling packets bigger than 1500 bytes. However, since this capability isn't standardized, it's seldom used today, despite the potential performance benefits of using larger Van Beijnum Expires December 29, 2007 [Page 1] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 packets. This document specifies a mechanism to negotiate per-neighbor maximum packet sizes so that nodes on a shared subnet may use the maximum mutually supported packet size between them without being limited by nodes with smaller maximum sizes on the same subnet. 1 Introduction Some protocols inherently generate small packets. Examples are VoIP, where it's necessary to send packets frequently before much data can be gathered to fill up the packet, and the DNS, where the queries are inherently small and the returned results also rarely fill up a full 1500-byte packet. However, most data that is transferred across the internet and private networks is at least several kilobytes in size (often much larger) and requires segmentation by TCP or another transport protocol. These types of data transfer can benefit from larger packets in several ways: 1. A higher data-to-header ratio makes for fewer overhead bytes 2. Fewer packets means fewer per-packet operations on the source and destination hosts 3. Fewer packets also means fewer per-packet operations in routers and middleboxes 4. TCP performance tends to increase with larger packet sizes Even though today, the capability to use larger packets (often called jumboframes) is present in a lot of ethernet hardware, this capability isn't used because IP assumes a common MTU size for all nodes connected to a link or subnet. In practice, this means that using a larger MTU requires manual configuration of the the non-standard MTU size on all hosts and routers and possibly on switches. Also, the MTU size for a subnet is limited to that of the least capable router, host or switch. This document proposes to end this situation using several new IPv6 options and messages: 1. An additional router advertisement MTU option to limit higher maximum packet sizes 2. A new switch advertisement message, similar to a router advertisement message, so that switches can announce the maximum packet size they support 3. A neighbor discovery option that allows nodes to inform their neighbors of the maximum packet size they support Van Beijnum Expires December 29, 2007 [Page 2] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 4. A new ICMPv6 message for confirming that packets with an increased maximum size can be transmitted and received successfully Nodes running IPv6 may take advantage of these mechanisms to send packets larger than the standard maximum size. Since IPv4 doesn't support equivalent mechanisms, support for IPv4 requires additional work that is best carried out after deployment experience with IPv6. 2 Terminology MTU: Maximum Transmission Unit. This is the maximum IP packet size in bytes supported on a link, towards a neighbor or towards a remote correspondent. In some cases, the term MRU (maximum receive unit) would be more appropriate, but for consistency, the term MTU is used throughout this document. Advised MTU: The MTU that is considered the best or safe choice at a given time on a given link. Allowed MTU: The maximum MTU allowed administratively. Local MTU: The maximum packet size considered usable on a node, based on the physical MTU, the allowed MTU and advised MTUs. Neighbor MTU: The maximum packet size that may be used towards a given on-link neighbor. Off-link MTU: The maximum packet size that is appropriate for communicating with off-link correspondents. Physical MTU: The MTU reported by the driver for an interface when operating at a given link speed. Tentative neighbor MTU: The maximum packet size advertised by a neighbor. 3 Disadvantages of larger packets Although often desirable, the use of larger packets isn't universally advantageous for the following reasons: Van Beijnum Expires December 29, 2007 [Page 3] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 1. Increased delay and jitter 2. Increased reliance on path MTU discovery 3. Increased packet loss through bit errors 4. Increased risk of undetected bit errors 3.1 Delay and jitter An low-bandwidth links, the additional time it takes to transmit larger packets may lead to unacceptable delays. For instance, transmitting a 9000-byte packet takes 7.23 milliseconds at 10 Mbps, while transmitting a 1500-byte packet takes only 1.23 ms. Once transmission of a packet has started, additional traffic must wait for the transmission to finish, so a larger maximum packet size immediately leads to a higher worst-case head-of-line blocking delay, and as such, to a bigger difference between the best and worst cases (jitter). The increase in average delay depends on the number of packets that are buffered, the average packet size and the queuing strategy in use. Buffer sizes vary greatly, but assuming 40 buffers (not uncommon) leads to the following results: Speed 500 1500 4500 9000 16384 65535 10 Mbps 17.22 49.21 145.22 289.22 525.50 2098.34 100 Mbps 1.72 4.92 14.52 28.92 52.55 209.83 1 Gbps 0.17 0.49 1.45 2.89 5.26 20.98 10 Gbps 0.02 0.05 0.15 0.29 0.52 2.01 In milliseconds and counting 38 additional bytes of ethernet overhead. If we assume that the delays involved with 1500-byte packets on 100 Mbps ethernet are acceptable for most, if not all, applications, then the conclusion must be that 9000-byte packets on 1 Gbps ethernet should also be acceptable. At 10 Gbps ethernet, much larger packet sizes could be accommodated without adverse impact on delay-sensitive applications. Below 100 Mbps, larger packet sizes are probably not advisable. 3.2 Path MTU Discovery problems PMTUD issues arise when routers can't fragment packets in transit because the DF bit is set or because the packet is IPv6, but the packet is too large to be forwarded over the next link, and the resulting "packet too big" ICMP messages from the router don't make it back to the sending host. This will typically happen when there is an MTU bottleneck somewhere in the middle of the path. If the MTU bottleneck is located at either end, the TCP MSS (maximum segment size) option makes sure that TCP packets conform to the limited MTU. PMTUD problems are of course possible with non-TCP protocols, but this is rare in practice. Van Beijnum Expires December 29, 2007 [Page 4] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 Taking the delay and jitter issues to heart, maximum packet sizes should be larger for faster links. This means that in the majority of cases, the MTU bottleneck will tend to be at one of the ends of a path, rather than somewhere in the middle. A crucial difference between PMTUD problems that result from MTUs smaller than the standard 1500 bytes and PMTUD problems that result from MTUs larger than the standard 1500 bytes is that in the latter case, only a party that's actually using the non-standard MTU is affected. This puts potential problems and potential benefits in the same place so it's always possible to revert to a 1500-byte MTU if PMTUD problems can't be resolved otherwise. Considering the above and the work that's going on in the IETF to resolve PMTUD issues as they exist today, means that increasing MTUs where desired doesn't involve undue risks. 3.3 Packet loss through bit errors All transmission media are subject to bit errors. In many cases, a bit error leads to a CRC failure, after which the packet is lost. In other cases, packets are retransmitted a number of times, but if error conditions are severe, packets may still be lost because an error occurred at every try. Using larger packets means that the chance of a packet being lost due to errors increases. And when a packet is lost, more data has to be retransmitted. Both per-packet overhead and loss through errors reduce the amount of usable data transferred. The optimum tradeoff is reached when both types of loss are equal. If we make the simplifying assumption that the relationship between the bit error rate of a medium and the resulting number of lost packets is linear with packet size, the optimum packet size is computed as follows: packet size = sqrt(overhead bytes / bit error rate) For IPv6 in ethernet framing, with 14 bytes of ethernet header, 40 bytes of IPv6 header, 20 bytes of TCP header and 32 bits of ethernet CRC the total number of bytes transmitted is 1538 while the useful data is 1440. (The preamble and inter frame gap are not relevant for error rate purposes.) 78 bytes of overhead would result in a 1518-byte frame length for a bit error rate of 10^-5.3. Note that the minimum BER for 1000BASE-T is 10^-10, which implies an optimum packet size of 312250 bytes. In practice, it's better to err on the side of smaller packets and lower packet loss to avoid triggering TCP congestion mechanisms. Van Beijnum Expires December 29, 2007 [Page 5] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 However, it's obvious that current maximum packet sizes are far below the optimum size with respect to optimum throughput. 3.4 Undetected bit errors Nearly all link layers employ some kind of checksum to detect bit errors so that packets with errors can be discarded. In the case of ethernet, this is a frame check sequence in the form of a 32-bit CRC. The error detecting properties of the CRC are twofold: the minimum Hamming distance and the statistical unlikeliness of two packets resulting in the same CRC. Depending on the size of the packet, there is a minimum Hamming distance between two possible packets that result in the same CRC. For ethernet packets between 376 and 11454 bytes long (including), the Hamming distance is 3 [CRC]. So all packets where transmission errors resulted in one or two flipped bits are detected. If 3 or more bits are flipped, most errors are caught because only in very few cases, the new bit pattern results in the same CRC as the old bit pattern. In theory, the chance of two packets having the same CRC-32 is 1 in 2^32, but this assumes the CRC is as strong as it possibly could be. It has been suggested that increasing packet lengths reduce the effectiveness of the CRC-32. For the statistical aspect of the CRC, this isn't true. Again, assuming a linear relationship between the likelihood of bit errors in a packet and the bit error rate, doubling the packet size means doubling the chance of a given number of bit errors in the packet. In turn, this doubles the chance of a packet with bit errors going undetected by the CRC. However, because the packet is twice as long, only half the number of packets is required to transmit any given amount of data. These aspects cancel each other out so the probability of a undetected errors occurring in any given data transfer doesn't vary with packet size when only considering the statistical properties of the CRC. Obviously, choosing a packet size that leads to a reduced Hamming distance greatly increases the risk of undetected bit errors. However, even choosing a larger packet size with a Hamming distance of 3 leads to a reduction in error detection strength. The likelihood of a packet having enough bit errors to satisfy a given Hamming distance (packet error rate) and then generate the same CRC is: PER = (packet length in bits * BER) ^ H / 2^32 The likelihood of a packet with enough bit errors to meet the Hamming distance and then generate an identical CRC in a transmission of a certain number of bits is: TER = transmission length / packet length * PER Van Beijnum Expires December 29, 2007 [Page 6] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 In other words: TER = transmission length / (packet length ^ (H - 1) * BER ^ H) / 2^32 (Hence the irrelevance of the packet length for a Hamming distance of 1.) For a 400 GB (approximately one hour) transmission over 1000BASE-T with a BER of 10^-10 and a 1518-byte ethernet frame length this means: TER = 3.44*10^12 * 12144 ^ 2 * 10^-10 ^ 3 / 2^32 = 1.18*10^-19 For 11454-byte packets this becomes: TER = 3.44*10^12 * 91632 ^ 2 * 10^-10 ^ 3 / 2^32 = 6.73*10^-18 Please note that this is 14 orders of magnitude better than the naive assumption of a Hamming distance of 1 suggests for standard 1518-byte ethernet frames: TER = 3.44*10^12 * 12144 ^ 0 * 10^-10 ^ 1 / 2^32 = 9.73*10^-4 So the strength of the CRC, assuming a Hamming distance of 3, goes down with the square of the factor by which the packet length is increased. And it goes down with the third power of any increase of the bit error rate. However, this discussion is largely academic because of the assumption that bit errors happen in isolation. For instance, 1000BASE-T transmits two bits per symbol over four wire pairs, so bit errors are much more likely to (at least) happen in pairs rather than isolated. Also, it should be possible to implement stronger frame check sequences for newer versions of ethernet. Unlike the packet length, the FCS is something switches can change when interconnecting different types of ethernet without harming interoperability. 3.5 Conclusion Larger packets aren't universally desireable. The factors that factor into the decision to use larger packets include: - A link's bit error rate - The number of bits per symbol on a link and hence the likelihood of multiple bit errors in a single packet - The strength of the Frame Check Sequence - The link speed - The number of buffers - Queuing strategy Van Beijnum Expires December 29, 2007 [Page 7] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 This means that choosing a good maximum packet size is, initially at least, the responsibility of hardware vendors. On top of that, robust mechanisms must be available to operators to further limit maximum packet sizes where appropriate. 4 The protocol mechanisms The basic idea is that nodes are free to negotiate larger MTUs with neighbors. However, to avoid problems, test packets are sent first before larger packets are used for actual traffic, and routers and switches may inform nodes of MTU limitations that are best observed or are mandatory to observe. 4.1 The variable MTU router advertisement option Routers use this option to inform hosts on connected subnets about the maximum allowed MTU for a given link speed and the off-link MTU that should be used towards off-link destinations. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Off-link MTU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Pri | Link speed | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Allowed MTU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: TBD Length: 2 Reserved: 0 on transmission, ignored on reception. Off-link MTU: This is the maximum packet size that a router can forward to other links it connects to. Hosts SHOULD use a TCP MSS option based on this value in all TCP sessions and limit packets sent to off-link destinations to this maximum. The off-link MTU must be at least 1280. A value of 0 means the off-link MTU is undefined and hosts should use their physical MTU in TCP MSS options and limit packets sent to routers to the maximum MTU the router supports as discovered through the neighbor discovery option. Van Beijnum Expires December 29, 2007 [Page 8] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 Pri: Priority. Values have the following meaning: 000: Vendor default 001: Local override of 000 010: Site default 011: Local override of 010 100: Subnet default 101: Local override of 100 110: Per-node setting 111: Local override of 110 Vendors may only use priority 000 in default configurations. Site-wide administrative settings may only use 000 and 010. Subnet-specific administrative settings may use 000, 010 or 110, but not 001, 011, 101 or 111. Link speed: Minimum link speed the option may apply to. Values from 0 to 49151 indicate a link speed in megabits per second. Values from 49152 to 65535 are reserved for future use, but imply a link speed of more than 49151 Mbps. Hosts MUST ignore all options with a link speed value that's higher than the current link speed of the interface the option is received over. For instance, if a host has an interface that supports 10, 100 and 1000 Mbps ethernet which currently operates at 100 Mbps, and the host receives options with link speed values of 100 and 1000 over that interface, the option with the link speed of 100 is processed and the option with the link speed of 1000 is ignored. Allowed MTU: The maximum packets size allowed on a link. Packets larger than this value MUST NOT be sent over the link in question. The allowed MTU MUST be at least 1500. A value of 0 means that the allowed MTU is undefined and no maximum MTU is enforced. The number of variable MTU options in router advertisements is limited to a maximum of 4. Hosts are expected to recover the variable MTU options from the router advertisements of at least the router they select as a default router, but it's allowed (not required) to recover options from multiple routers. The same option, or data constituting the same information, may be learned from other sources, such as local configuration and/or DHCPv6. Host MUST only consider variable MTU options where the value of the link speed field doesn't exceed that of the current link speed of the associated interface. Any options (or equivalents) that satisfy this condition are ordered by the priority, link speed and allowed MTU fields, in that order. Hosts SHOULD copy the allowed MTU and off-link Van Beijnum Expires December 29, 2007 [Page 9] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 MTU information, if specified, from the option (or equivalent) with the largest value for the concatenation of these three fields. 4.2 Changes to the RA MTU option semantics Hosts are currently supposed to ignore an MTU of more than 1500 in the MTU option in router advertisements on ethernet links [RFC2464]. This makes it impossible to use an MTU larger than 1500 bytes for multicast packets. In order to lift this limitation, routers and hosts that implement variable MTU subnets may advertise and accept, respectively, an MTU option with an MTU larger than 1500. Hosts should use the minimum of the maximum feasible MTU and the MTU in the RA MTU option for the transmission of multicast packets. Note that advertising an MTU option larger than 1500 can only work on subnets where all the hosts implement variable MTU subnets. 4.3 The switch MTU advertisement message Switches and other layer 2 devices MAY advertise the maximum MTU they support in an ICMPv6 [RFC2463] message sent to multicast address TBD. The format of this ICMPv6 message is as follows: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of MTUs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Switch identifier + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Link speed 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Advised MTU 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Link speed 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Advised MTU 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... | Reserved | Link speed N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Advised MTU N | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: TBD (informational) Van Beijnum Expires December 29, 2007 [Page 10] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 Code: TBD Checksum: see [RFC2463] Number of MTUs: Number of times the reserved/link speed/advised MTU fields are repeated for different link speed values. The minimum is 1, the maximum 4. Switch identifier: a 64-bit value that is unique to the switch. Reserved: 0 on transmission, ignored on reception. Link speed: Minimum link speed the option may apply to. Values from 0 to 49151 indicate a link speed in megabits per second. Values from 49152 to 65535 are reserved for future use, but imply a link speed of more than 49151 Mbps. Hosts MUST ignore all options with a link speed value that's lower than the current link speed of the interface the option is received over. Note that this is the opposite behavior of that specified for the link speed in the RA variable MTU option. Advised MTU: The IPv6 MTU the switch supports on ports operating at the indicated link speed. In the case of ethernet, the IPv6 MTU is the maximum frame size after subtracting the size of the VLAN tag, the 14-byte Ethernet II header and the frame check sequence. Switch MTU advertisements should be sent out at 5-minute intervals. When a port transitions from an inactive or disconnected to an active state, the interval MAY be reduced to 60 seconds, such that if it has been 60 seconds or longer ago that the last switch MTU advertisement was sent out, a switch MTU advertisement is sent out immediately. If the switch doesn't otherwise implement IPv6, or the IPv6 protocol is inactive, the IPv6 source address should be the unspecified address. Since all the information in the message is thus known in advance, the entire message, including the checksum, may be pre-calculated without the need to implement IPv6 in the switch. Host SHOULD monitor switch MTU advertisement messages, using the switch identifier field to detect refreshes/duplicates, and retain all switch MTU advertisements for 10 minutes. When the switch MTU advertisement information changes (new advertisements, new information in previously known advertisements, advertisements expire), hosts SHOULD select the minimum advised MTU value where the associated link speed is equal to or higher than the current link speed on the associated interface. The thusly recovered advised MTU for the link is Van Beijnum Expires December 29, 2007 [Page 11] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 the minimum of the MTUs supported by all the switches for this particular link speed if all switches implement the switch MTU advertisement mechanism. 4.4 The neighbor discovery MTU option A node that implements the variable MTU subnet capability SHOULD include an MTU option in both neighbor solicitation and neighbor advertisement messages [RFC2461]. A node MAY omit the option if the use of a larger MTU isn't desired at that time or if the MTU it would advertise is equal to or lower than the MTU that would otherwise be used. However, there is no requirement to omit the option depending on the value of the different MTU variables as the receiver must implement the logic required to determine which MTU to use anyway. The format of the neighbor discovery MTU option is as follows: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MTU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: TBD Length: 1 Reserved: set to 0 on transmission, ignored on reception. MTU: The maximum packet size the node is prepared to send and receive, which is copied from the local MTU. The minimum valid value is 1280. Reception of a neighbor solicitation or a neighbor advertisement triggers the sending of an ICMPv6 MTU detection message. The MTU detection message Since it's possible that there are layer 2 devices that don't implement the switch MTU advertisement message in the path between two nodes, it's necessary to make that it is indeed possible to send and receive packets larger than the standard MTU. This is what the ICMPv6 MTU detection message is for. It has the following format: Van Beijnum Expires December 29, 2007 [Page 12] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packet size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Padding | ... | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: TBD (informational) Code: TBD Checksum: see [RFC2463] R (reply requested): 0: no reply requested, 1: reply requested Reserved: 0 on transmission, ignored on reception Packet size: Size of this packet, including IPv6 and other headers. A value of 0 indicates no padding is present and the size of the packet shouldn't be considered. Padding: 0 or more 0 bytes to bring the packet to the specified packet size. In order to avoid sending large numbers of packets that can't be handled properly by switches or other layer 2 devices, after sending a large MTU detection packet, no other maximum size MTU detection packets may be transmitted on the same interface for 60 seconds or until a large MTU detection packet has been received, whichever happens first. In this context, "large" means larger than the standard MTU size for the link type, i.e., 1500 bytes for ethernet. When variable MTU subnet capability is detected for a neighbor by the presence of an MTU option in a neighbor solicitation or neighbor discovery message, an MTU detection message is constructed as follows: R: Set to 0 if the neighbor MTU is known and confirmed, set to 1 otherwise. Van Beijnum Expires December 29, 2007 [Page 13] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 Packet size: Equal to the minimum of the local MTU and the (tentative) neighbor MTU. When an MTU detection packet is received, the size of the packet is checked against the value in the packet size field to detect truncation in transit. If the packet size and the packet size field don't match, or if the packet size is smaller than 1280 bytes, the message is silently discarded. If the received message has the R flag set to 1, a reply is constructed as follows: R: 0 Packet size: Equal to the minimum of the local MTU and the neighbor MTU. The neighbor MTU overrules information in the TCP MSS option in TCP sessions towards that neighbor. Neighbor MTU information expires along with link addresses learned through neighbor discovery and upon dead neighbor detection. 4.5 Determining the local MTU The local MTU is the value communicated to neighbors. It is the minimum of the physical MTU for an interface and the allowed MTU as advertised by a router or learned through other means. The local MTU may be further reduced by the reception of switch MTU advertisements. 5 References 5.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2461] Narten, T., Nordmark, E., and W. Simpson, "Neighbor Discovery for IP Version 6 (IPv6)", RFC 2461, December 1998. [RFC2462] Thomson, S. and T. Narten, "IPv6 Stateless Address Autoconfiguration", RFC 2462, December 1998. 5.2 Informative References [CRC] Jain, R., ""Error Characteristics of Fiber Distributed Data Interface (FDDI)", IEEE Transactions on Communications, August 1990. Van Beijnum Expires December 29, 2007 [Page 14] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 6 Document and Author Information This document expires December, 2007. The latest version will always be available at http://www.muada.com/drafts/. Please direct questions and comments to the ipv6 or int area mailinglists or directly to the author: Iljitsch van Beijnum Email: iljitsch@muada.com Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement Van Beijnum Expires December 29, 2007 [Page 15] Internet-Draft IPv6 Extensions for Multi-MTU Subnets June 2007 this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Van Beijnum Expires December 29, 2007 [Page 16]