Internet Engineering Task Force P. Savola Internet-Draft CSC/FUNET Expires: December 7, 2004 June 8, 2004 MTU and Fragmentation Issues with In-the-Network Tunneling draft-savola-mtufrag-network-tunneling-00.txt Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 7, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract Tunneling techniques such as IP-in-IP when deployed in the middle of the network, typically between routers, have certain issues regarding how large packets can be handled: whether such packets would be fragmented and reassembled (and how), whether Path MTU Discovery would be used, or how this scenario could be operationally avoided. This memo justifies why this is a common, non-trivial problem, and goes on to describe the different solutions and their characteristics at some length. Savola Expires December 7, 2004 [Page 1] Internet-Draft Packet Size Issues in Network Tunneling June 2004 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 3. Description of Solutions . . . . . . . . . . . . . . . . . . . 4 3.1 Fragmentation and Reassembly by the Tunnel Endpoints . . . 4 3.2 Signalling the Lower MTU to the Sources . . . . . . . . . 5 3.3 Encapsulate Only When there Is Free MTU . . . . . . . . . 6 3.4 Fragmentation of the Inner Packet . . . . . . . . . . . . 6 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 7 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8.1 Normative References . . . . . . . . . . . . . . . . . . . . 8 8.2 Informative References . . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9 Intellectual Property and Copyright Statements . . . . . . . . 10 Savola Expires December 7, 2004 [Page 2] Internet-Draft Packet Size Issues in Network Tunneling June 2004 1. Introduction A large number of ways to encapsulate datagrams to other packets, or tunneling mechanisms, have been specified over the years: for example, IP-in-IP (e.g., [1]), GRE [2], L2TP [3], or IPsec [4] in tunnel mode -- any of which might run on top of IPv4, IPv6, or some other protocol and carrying the same or a different protocol. All of these can be run so that the endpoints of the inner protocol are co-located with the endpoints of the outer protocol; in a typical scenario, this would correspond to "host-to-host" tunneling. It is also possible to have one set of endpoints co-located, i.e., host-to-router or router-to-host tunneling. Finally, many of these mechanisms are also employed between the routers for all or a part of the traffic that passes between them, resulting in router-to-router tunneling. All these protocols and scenarios have one issue in common: how do you select the packet sizes so that they will fit in the network, even encapsulated, and if you cannot affect the packet sizes, what do you do to be able to encapsulate them in any case? The four main solutions are (these will be elaborated in Section 3): 1. Fragmenting all the big the encapsulated packets to fit in the paths, and reassembling them at the tunnel end-points. 2. Signal to all the sources whose traffic must be encapsulated, and is larger than that fits, to send smaller packets, e.g., using Path MTU Discovery [5][6]. 3. Ensure that in the specific environment, the encapsulated packets will fit in all the paths in the network, e.g., by using MTU bigger than 1500 in the backbone used for encapsulation. 4. Fragmenting the original too big packets so that their fragments will fit, even encapsulated, in the paths, and reassembling them at the destination nodes. Note that this is only available for IPv4 packets under very specific conditions. The tunneling packet size issues are relatively straightforward in host-to-host tunneling or host-to-router tunneling where Path MTU Discovery only needs to signal to one source node. The issues are signficantly more difficult in router-to-router and certain router-to-host scenarios, which are the focus of this memo. There are also known challenges in specifying and implementing a mechanism which would be used at the tunnel end-point to obtain the best suitable packet size to use for encapsulation; if a static value Savola Expires December 7, 2004 [Page 3] Internet-Draft Packet Size Issues in Network Tunneling June 2004 is chosen, a lot of fragmentation might end up being performed; if PMTUD is used, the implementation would need to use or relay the received Packet Too Big messages, and assume that sufficient data has been biggybacked on the ICMP messages (beyond the required 64 bits for ICMPv4) to make this possible. However, this problem is described elsewhere (e.g., in [2] and [1]) and is out of scope of this memo. Section 2 includes a problem statement, section 3 describes the different solutions with their drawbacks and advantages, and section 4 presents conclusions. 2. Problem Statement It is worth considering why exactly this is considered a problem. It is possible to fix all the packet size issues using the solution 1, fragmenting the resulting encapsulated packet, and reassembling it by the tunnel endpoint. However, this is considered problematic for at least three reasons, as described in Section 3.1. Therefore it is desirable to avoid fragmentation and reassembly if possible. On the other hand, the other solutions may not be practical either: especially in router-to-router or router-to-host tunneling, Path MTU Discovery might be very disadvantageous -- consider the case where a backbone router would send an ICMP Packet Too Big messages to every source who would try to send packets through it. Fragmenting before encapsulation is also not available in IPv6, and not available when the DF bit has been set (or the datagram has already been fragmented). Ensuring high enough MTU so encapsulation is always possible is of course a valid approach, but requires careful operational planning, and may not be a feasible assumption for implementors. This yields that there is no trivial solution to this problem, and it needs to be further explored to consider the tradeoffs, as is done in this memo. 3. Description of Solutions This section describes the potential solutions in a bit more detail. 3.1 Fragmentation and Reassembly by the Tunnel Endpoints The seemingly simplest solution to tunneling packet size issues is fragmentation of the outer packet by the encapsulator, and reassembly by the decapsulator. However, this is highly problematic for at least three reasons: Savola Expires December 7, 2004 [Page 4] Internet-Draft Packet Size Issues in Network Tunneling June 2004 o Fragmentation causes overhead: every fragment requires the IP header (20 or 40 bytes), and with IPv6, additional 8 bytes for the Fragment Header. o Fragmentation and reassembly require computation: splitting datagrams to fragments is a non-trivial procedure, and so is their reassembly. For example, software router forwarding implementations may not be able to be perform these operations at line rate. o Reassembling requires buffers: fragments might get lost, be reordered or delayed; when that happens, the reassembly engine has to wait with the partial packet for some time. When this would have to be done at the line rate, with e.g., 10 Gbit/s speed, the length of the buffers that reassembly might require, especially in the worst case, might be considerable. When examining router-to-router tunneling, the third problem is likely the worst; certainly, a hardware computation and implementation requirement would also be significant, but not all that difficult in the end -- and the link capacity wasted in the backbones by additional overhead might not be a huge problem either. So, if reassembly could be made to work sufficiently reliably, this would be one acceptable fallback solution. 3.2 Signalling the Lower MTU to the Sources Another approach is to use techniques like Path MTU Discovery (or potentially a better working, future derivative [7]) to signal to the sources whose packets will be encapsulated in the network to send smaller packets so that they can be encapsulated. This approach would presuppose that PMTUD works. While it is currently working for IPv6, and critical for its operation, there is ample evidence that in IPv4, PMTUD is far from reliable due to e.g., firewalls and other boxes being configured to inappropriately drop all the ICMP packets. Further, there are two scenarios where signalling from the network would be highly undesirable: when the encapsulation would be done in such a prominent place in the network that (even) millions (or even vastly more) sources would need to be signalled with this information (possibly even multiple times, depending on how long they keep their PMTUD state), or when the encapsulation is done for passive monitoring purposes (network management, lawful interception, etc.) -- when it's critical that the sources whose traffic is being encapsulated are not aware of this happening. Savola Expires December 7, 2004 [Page 5] Internet-Draft Packet Size Issues in Network Tunneling June 2004 A new approach to PMTUD is in the works [7], but it is uncertain whether that would fix the problems -- at least not the passive monitoring requirements. 3.3 Encapsulate Only When there Is Free MTU The third approach is an operational one, depending on the environment where encapsulation and decapsulation is being performed. That is, if an ISP would deploy tunneling in its backbone, which would consist only of links supporting high MTUs (e.g., Gigabit Ethernet or SDH/SONET), but all its customers and peers would have a lower MTU (e.g., 1500, or the backbone MTU minus the encapsulation overhead), this would imply that no packets would have larger MTU than the "backbone MTU", and all the encapsulated packets would always fit MTU-wise in the backbone links. This approach is highly assumptive of the deployment scenario. It may be desirable to build a tunnel to/from another ISP (for example), where this might no longer hold; or there might be links in the network which cannot support the higher MTUs to satisfy the tunneling requirements; or customers themselves might try to tunnel fragmented packets to the ISP, requiring the reassembly capability from the ISP's equipment (in this last case, it might be possible to get the MTU at the customer's end lowered, eliminating the fragmentation, but it might not always be an option). Another, related approach might be having the sources use only a low enough MTU which would fit in all the physical MTUs; for example, IPv6 specifies the minimum MTU of 1280 bytes. For example, if all the sources whose traffic would be encapsulated would use this as the maximum packet size, there would probably always be enough free MTU for encapsulation in the network. However, this is not the case today, and it would be completely unrealistic to assume that this kind of approach could be made to work in general. All in all, while in certain operational environments it might be possible to avoid any problems by deployment choices, or limiting the MTU sources use, this is probably not a sufficiently good general solution for the equipment vendors, and other solutions must also be provided. 3.4 Fragmentation of the Inner Packet A final possibility is fragmenting the inner packet, before encapsulation, in such a manner that the encapsulated packet fits in the the path MTU. However, one should note that only IPv4 supports this "in-flight" fragmentation; further, it's not possible for packets which have already been fragmented and it isn't allowed for Savola Expires December 7, 2004 [Page 6] Internet-Draft Packet Size Issues in Network Tunneling June 2004 packets where Don't Fragment -bit has been set. Even if one could ignore IPv6 completely, so many IPv4 host stacks send packets with DF bit set that this would seem unfeasible. It is interesting to note that at least one implementation provides a special knob to fragment the inner packet prior to encapsulation even if the DF bit has been set -- this is non-compliant behaviour, but possibly has been required in certain tightly controlled passive monitoring scenarios. Such a setup wouldn't work for packets which have already been fragmented if they needed to be fragmented again, though. In summary, this approach does not seem to be feasible in general either. 4. Conclusions Fragmentation and reassembly by the tunnel endpoints is a clear solution to the problem, but the hardware reassembly when the packets get lost may face significant implementation challenges. Whether these challenges are practically insurmountable or not should be evaluated. However, this reassembly approach is probably not a problem for passive monitoring applications. PMTUD techniques, at least at the moment and especially for IPv4, appear to be too unreliable or unscalable to be used in the backbones. It is an open question whether a future solution might work better in this aspect. It is clear that in some environments, the operational approach to the problem, ensuring that fragmentation is never necessary by keeping higher MTUs in the networks where encapsulated packets traverse, is sufficient. But this is unlikely to be enough in general, and for vendors which may not be able to make assumptions about the operators' deployments. Fragmentation of the inner packet does not work appropriately and should not be used; fragmentation of the outer packet seems a better option for passive monitoring. An interesting thing to explicitly note is that when tunneling is done in a high-speed backbone, typically one may be able to make assumptions on the environment; however, when reassembly is not performed in such a network, it might be done in software or with lower requirements, and there either a reassembly implementation, using PMTUD, or using a separate approach for passive monitoring -- so this might not be a real problem. Savola Expires December 7, 2004 [Page 7] Internet-Draft Packet Size Issues in Network Tunneling June 2004 In consequence, the critical questions at this point appear to be 1) whether a higher MTU can be assumed in the high-speed networks that deploy tunneling, and 2) whether "slower-speed" networks could cope with a software-based reassembly, a less capable hardware-based reassembly, or the other workarounds. XXX: More TBD? 5. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC. 6. Security Considerations This document describes different issues with packet sizes and in-the-network tunneling; this does not have security considerations on its own. However, different solutions might have characteristics which may make them more susceptible to attacks -- for example, a router-based fragment reassembly could easily lead to (reassembly) memory exhaustion if the attacker would send a sufficient number of partial fragments; these attacks have already been used against e.g., firewalls and host stacks, and need to be taken into consideration in the implementations. 7. Acknowledgements While the topic is far from new, recent discussions with W. Mark Townsley on L2TP fragmentation issues caused the author to sit down and write up the issues in more general. 8. References 8.1 Normative References [1] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2-02 (work in progress), February 2004. [2] Farinacci, D., Li, T., Hanks, S., Meyer, D. and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. [3] Lau, J., Townsley, M. and I. Goyret, "Layer Two Tunneling Protocol (Version 3)", draft-ietf-l2tpext-l2tp-base-13 (work in Savola Expires December 7, 2004 [Page 8] Internet-Draft Packet Size Issues in Network Tunneling June 2004 progress), April 2004. [4] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998. [5] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [6] McCann, J., Deering, S. and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. 8.2 Informative References [7] Mathis, M., "Path MTU Discovery", draft-ietf-pmtud-method-01 (work in progress), February 2004. Author's Address Pekka Savola CSC/FUNET Espoo Finland EMail: psavola@funet.fi Savola Expires December 7, 2004 [Page 9] Internet-Draft Packet Size Issues in Network Tunneling June 2004 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Savola Expires December 7, 2004 [Page 10]