Internet Engineering Task Force A. Biswas Internet-Draft NetApp, Inc. Intended status: Standards Track May 25, 2010 Expires: November 26, 2010 Support for Stronger Error Detection Codes in TCP for Jumbo Frames draft-ietf-tcpm-anumita-tcp-stronger-checksum-00 Abstract There is a class of data serving protocols and applications that cannot tolerate undetected data corruption on the wire. Data corruption could occur at the source in software, in the network interface card, out on the link, on intermediate routers or at the destination network interface card or node. The Ethernet CRC and the 16-bit checksum in the TCP/UDP headers are used to detect data errors. Most applications rely on these checksums to detect data corruptions and do not use any checksums or CRC checks at their level. Research has shown that the TCP/UDP checksums are catching a significant number of errors, however, the research suggests that one packet in 10 billion will have an error that goes undetected for Ethernet MTU frames (MTU of 1500). Under certain situations, "bad" hosts can introduce undetected errors at a much higher frequency and order. With the use of Jumbo frames on the rise, and therefore more data bits on the wire that could be corrupted, the current 16-bit TCP/UDP checksum, or the Ethernet 32-bit CRC are simply not sufficient for detecting errors. This document specifies a proposal to use stronger checksum algorithms for TCP Jumbo Frames for IPv4 and IPv6 networks. The Castagnoli CRC 32C algorithm used in iSCSI and SCTP is proposed as the error detection code of choice. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 26, 2010. Biswas Expires November 26, 2010 [Page 1] Internet-Draft Stronger TCP Error Detection May 2010 Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 4 2. Calculating the CRC-32C value . . . . . . . . . . . . . . . . 4 3. Negotiating the use of CRC 32C . . . . . . . . . . . . . . . . 6 4. IPv6 Considerations . . . . . . . . . . . . . . . . . . . . . 8 5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Biswas Expires November 26, 2010 [Page 2] Internet-Draft Stronger TCP Error Detection May 2010 1. Introduction There is a class of data serving applications that host business and financial data. Detecting and recovering from data corruption is paramount to the success of this class of applications. Data corruption can occur while data is transiting from the source to a desired destination. Data can get corrupted right at the source due to software errors, within the network interface card, out on the wire or link, in intermediate routers and at the destination network interface or node. Link errors are detected using the Ethernet 32- bit CRC. Node or router errors are detected using the 16-bit checksum in the transport headers of TCP and UDP. Most applications do not have built-in error detection capability and typically rely on the checksums in the underlying networking layers. Stone et al. [Stone] have recommended applications employ their own checksums to detect errors that go undetected by lower levels. They have made this recommendation for the standard Ethernet MTU. They have done so considering situations where a "bad" host can introduce undetected errors at a much higher frequency and order. It must also be said that the physical layer already does encodings with bit error rates(BER) of 10^-12 ti 10^-14 and therefore the current checksum algorithms may be sufficient. However, stronger checksumming accounts for the cases where noisy hardware, bad cables can introduce noise at a much higher frequency and order. It is also to be noted that increasing speed of the physical medium (to 40G and 100G) can also lead to higher BER. Another dynamic, very much in the rise is the use and deployment of Jumbo Frames. Jumbo Frames reduce per packet overheads significantly and are a cheap way of improving the performance of bulk data applications. Combining the use of Jumbo frames with noisy physical medium increases the risk of undetected bit errors as there simply are more bits that can get corrupted. This is rather concerning as business and financial data typically are transported over the network using file access based protocols like NFS, CIFS, HTTP over TCP. The strength of the Ethernet CRC checksum and the 16-bit Transport checksum has been found to reduce for data segments that are larger than the standard Ethernet MTU. Koopman et. al. [Koopman] have explored a number of CRC polynomials as well as the polynomial used in the Ethernet CRC calculation. They have measured the effectiveness of these CRC polynomials for different data word lengths, where a data word is a bit stream from 64 bits to 128 Kbits. These data word lengths cover lengths equivalent to Ethernet MTUs and Jumbo frames and also frame lengths larger than Jumbo frames. They found that the Castagnoli polynomial x^32 + x^28 + x^27 + x^26 + x^25 + x^23 + x^22 + x^20 + x^19 + x^18 + x^14 + x^13 + x^11 + x^10 + x^9 Biswas Expires November 26, 2010 [Page 3] Internet-Draft Stronger TCP Error Detection May 2010 + x^8 + x^6 + x^0 represented as the 32-bit code 0x8F6E37A0 bests other CRC polynomials for Jumbo frames and larger segments. This polynomial has been adopted by the iSCSI and SCTP standards. It is to be noted that this polynomial is represented as the 32-bit code 0x11EDC6F41 in SCTP in accordance to the convention adopted for bit- ordering at the transport-level, i.e., bit-ordering for mapping SCTP messages to polynomials is that bytes are taken most significant first, but within each bytes, bits are taken least-significant first. Given the ubiquity of TCP, it is the layer where we can introduce stronger error detection capability without duplicating the effort in higher layers. TCP options provide an easy path to introduce stronger checksum without hindering interoperability. TCP options allow a TCP stack supporting a TCP option to interoperate seamlessly with a TCP stack that does not support the new TCP option (RFC 1122 [RFC1122] requires the interoperability in Section 4.2.2.5). This document proposes that the use of the Castagnoli polynomial, also known as the CRC 32C as the "checksum" of choice for TCP protocol. Other summation based checksum algorithms like Fletcher and Adler's algorithm were evaluated in RFC 3385 [RFC3385] and found to behave substanially worse than CRCs and hence are not considered in this proposal. By standardizing a stronger checksum at the TCP level, we can quickly drive the offloading of this checksum to NIC hardware, just as the 16-bit TCP checksum is offloaded by most NIC vendors today. Offloading computation to hardware allows us to get rid of the in- software computation overheads of stronger checksum algorithms. Another positive effect of implementing strong TCP checksumming is that this will drive the rapid adoption of 9K Jumbo frames and make it considerably easier to consider even larger Jumbo Frames. 1.1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Calculating the CRC-32C value The 16-bit TCP checksum does a checksum of the TCP header and payload. It also includes the pseudo header values of Source Address, Destination Address, Protocol and TCP Length. The addition of the bytes of a pseudo header into a summation based checksum algorithm is simpler than the inclusion of the bytes of a pseudo Biswas Expires November 26, 2010 [Page 4] Internet-Draft Stronger TCP Error Detection May 2010 header into a CRC computation. This is because a CRC computation assumes a contiguous bit stream when translating the bit stream to a polynomial for doing the polynomial division. The psuedo header was added to the TCP checksum computation in order to detect errors introduced in one of the IP header fields that could possibly cause the packet to be sent to an incorrect destination. These fields also get included in the IP header checksum. The intent was to include them in two separate checksums for better data integrity. One can question the need for including the pseudo-header fields twice. The pseudo-header currently get included thrice if one considers the fact that the Ethernet CRC is computed over the entire Ethernet frame and Ethernet is ubiquitous today. So for the purposes of this draft, all the fields used in the current TCP checksum except the pseudo-header must be included in the CRC-32c calculation. If this draft's proposal is accepted for standardization, IETF may elect to add back the pseudo-header into the CRC-32C calculation or add only a smaller subset of the fields. But it is to be noted that in this proposal we do have room to consider changes like this without disrupting current installations. It may also be questionable whether one needs to compute the 16-bit TCP checksum if the new TCP checksum option is present. To avoid a chicken and egg problem, this document proposes that the 16-bit checksum field be zeroed out and included in the CRC 32C checksum as part of the TCP header bit stream. The standardization process may choose a different approach and decide to do both the 16-bit TCP checksum and the CRC 32C checksum, in which case, a method will need to be defined as to the order of checksumming and the fields used in each of the checksum computations. This document also recommends the use of the CRC-32C when the negotiated Maximum Segment Size (MSS) value is equal or greater than 8948 bytes (excluding frame and TCPIP header bytes), the most common Jumbo Frame size, but does not explicitly recommend the use of CRC- 32C for standard Ethernet MTU frames. The CRC-32C MAY be used even for regular Ethernet MTU frames also if the application so desires for stricter data integrity checking, since CRC-32C can detect more independent bit errors than Ethernet CRC for Ethernet MTU sized packets. The use of CRC-32C can be made settable by the application, by providing a socket option to the application. The provision for an application to enable/disable the use of the new checksum option is left as an API detail of the particular TCP/IP socket layer implementation. The following section describes two possible approaches to negotiating the proposed 32-bit TCP checksum. The common thread in the two approaches is the use of TCP options to negotiate the use of Biswas Expires November 26, 2010 [Page 5] Internet-Draft Stronger TCP Error Detection May 2010 this checksum during the connection setup phase. Once the connection is setup, all subsequent packets sent during the connection transfer phase MUST carry the stronger checksum except as described below. It is also possible that Path MTU discovery causes a connection to reduce the negotiated MSS value post connection establishment. So, during connection establishment, an MSS equal or greater than 9K might have been negotiated along with stronger TCP checksumming, and then later the MSS reduced to be equal to the discovered path MTU. If the reduced MSS value is equal or less than an Ethernet MSS (typically 1460 without other TCP options), then the TCP end point that reduced its MTU may choose to NOT send the TCP checksum option in subsequent data packets. The peer must then rely on the 16-bit TCP checksum for end to end data integrity which is okay since the Ethernet CRC has comparable data integrity checking capability for Ethernet sized packets. Now, let us discuss the method for computing the CRC 32c value: The CRC computation uses polynomial division. The TCP header and payload is mapped to a polynomial and the CRC is calculated by dividing the bit stream with the CRC 32C polynomial. Stone et. al. in Appendix B of RFC 4960 [RFC4960] describe a convention for mapping the bytes of the bit stream into the polynomial. The same MUST be adopted for TCP transport too. 3. Negotiating the use of CRC 32C There are two possible approaches to negotiating the proposed CRC 32C checksum during the TCP connection setup phase. o A new TCP option o Using the TCP Alternate Checksum Data Option The first approach introduces a new TCP option to be negotiated by TCP endpoints during the connection setup phase. It will be of the same format as other defined TCP options and will have Type, Length and Value fields. A new type will be requested from IANA. The length field will be the sum total length of the new TCP checksum option which is 6 bytes. The value field will hold the 32-bit CRC 32C checksum. If either one of the peers does not add this option to its TCP options list in its SYN segment, the CRC-32C checksum must not be used by the other peer. Most TCP implementations are written to process the TCP options they recognize and ignore unknown options on Biswas Expires November 26, 2010 [Page 6] Internet-Draft Stronger TCP Error Detection May 2010 SYN segments so an endpoint that supports the new TCP option can interoperate with an endpoint that does not support the proposed TCP option. Since we have seen that the 16-bit TCP checksum is insufficient for detecting multiple independent errors for Jumbo frames, this proposal says that a peer supporting this option MUST send the new TCP checksum option if its link MTU is equal or greather than 9K. However, if the remote peer does not recognize the new option, the initiating peer MUST NOT use this TCP extension for the connection transfer phase. If the remote peer recognizes the option and also has a Maximum Segment Size equal to the peer's advertised MSS or a minimum MSS of 9K, it MUST respond with the TCP checksum option. Every subsequent packet from both peers must include this option in the TCP header. The extra overhead for adding this option is minimal for Jumbo frame sized segments and the higher data integrity pays for itself. Note that all TCP control packets sent after succesfully negotiating this TCP option may carry this TCP option also, although this draft does not mandate it. TCP CRC Checksum Option. +----------+------------+----------------------------+ | Kind = X | Length = 6 | Value = 4 bytes of CRC 32C | |----------+------------+----------------------------+ . Figure 1 The second approach utilizes a pair of existing TCP options called the "TCP Alternate Checksum Options" specified in RFC 1146 [RFC1146]. The current checksum types specified by that option are TCP checksum, 8-bit Fletcher's algorithm and 16-bit Fletcher's algorithm. A new checksum type can be added to this list for CRC-32C checksums. The negotiation rules for selecting the checksum type would follow the rules described in RFC1146. That is, if both SYN segments carry the Alternate Checksum Request option, and both specify the same algorithm, that algorithm must be used for the remainder of the connection. Otherwise, the standard TCP checksum must be used for the entire connection. Once the CRC 32C checksum algorithm is negotiated, the TCP Alternate Checksum Data Option is sent whose data will equal 4 bytes for the CRC-32C checksum. Biswas Expires November 26, 2010 [Page 7] Internet-Draft Stronger TCP Error Detection May 2010 TCP Alternate Checksum Request Option +-----------+------------+-----------------+ | Kind = 14 | Length = 3 | Value = CRC-32C | |-----------+------------+-----------------+ Here the value for CRC32C would need to be defined, and may possibly be the next undefined value '3', following the definitions for 8-bit and 16-bit fletcher's algorithms. TCP Alternate Checksum Data Option +-----------+------------+--------------------------------+ | Kind = 15 | Length = 6 | Value = CRC-32C computed value | |-----------+------------+--------------------------------+ The TCP Alternate Checksum Data Option must be sent only during the connection transfer and tear down phase. Again, the 16-bit TCP checksum field must be zeroed out before computing the 32-bit CRC 32C code. One or more padding bytes may be used when sending any of the above options to align to a 4 or 8 byte boundary for faster parsing on both 32-bit and 64-bit machines. At this stage of draft development, the author is evaluating and seeking inputs for both approaches. 4. IPv6 Considerations The TCP extension for CRC 32C can be applied equally to IPv4 and IPv6. The pseudo header for IPv6 includes 128 bit source and destination addresses. This pseudo header, the TCP header and payload MUST be included in the CRC 32C checksum of a TCP/IPv6 segment as there is no IPv6 header checksum. 5. Conclusions and Acknowledgements This document proposes the use of stronger error detection codes for TCP connections sending Jumbo Frames. It does not provide a solution for UDP based applications. I would also like to thank Tom Kessler (kessler@netapp.com) for his review comments. He specifically pointed out his concerns about the safety of TCP checksum + Ethernet CRC at 40G and 100G speeds with even 9K jumbo frames. He also provided information on the Intel instruction set that can be used to speed up CRC-32c computation. Special thanks to Janet Takami (jtakami@netapp.com) for her comments as well as for pointing out that there is no IPv6 header checksum and so the pseudo header must Biswas Expires November 26, 2010 [Page 8] Internet-Draft Stronger TCP Error Detection May 2010 be included in the CRC 32c checksum. 6. IANA Considerations This memo includes a request to IANA for a new Type Number for the new TCP Checksum Option if we do not go with the TCP Alternate Checksum Option. If we go with the TCP Alternate Checksum option, then a new checksum type will need to be defined for CRC 32C, probably after the defined values for Fletcher's 8-bit and 16-bit algorithm types. 7. Security Considerations The CRC 32C codes can detect unintentional changes to data such as those caused by noise. If an attacker changes the data, it can also change the error-detection code to match the changed data. Hence, these codes are not intended for security purposes. 8. References 8.1. Normative References [RFC1122] IETF, "Requirements for Internet Hosts -- Communication Layers", October 1989. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 8.2. Informative References [Koopman] Koopman, P., "32-Bit Cyclic Redundancy Codes for Internet Applications", 2002. [Stone] Stone, J., Partridge, C., "When the CRC and TCP Checksum Disagree" [RFC1146] Zweig, J., Partridge, C., "TCP Alternate Checksum Options" March 1990. [RFC3385] Sheinwald, D., et. al. "Internet Protocol Small Computer System Interface (iSCSI) Cyclic Redundance Check (CRC)/ Checksum Considerations", September 2002. [RFC4960] Stewart, R., "Stream Control Transmission Protocol", September 2007. Biswas Expires November 26, 2010 [Page 9] Internet-Draft Stronger TCP Error Detection May 2010 Author's Address Anumita Biswas NetApp, Inc. 495, E. Java Dr Sunnyvale, CA 95054 USA Phone: +14088223204 Email: anumita.biswas@netapp.com Biswas Expires November 26, 2010 [Page 10]