Network Working Group W. Leibzon Internet-Draft Elan Networks Expires: January 11, 2006 July 10, 2005 Content-Digest and EDigest Header Fields draft-leibzon-content-digest-edigest-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 11, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document defines Content-Digest header field, which can be used for including hash of MIME content body and header fields data and can support several hash algorithms and canonicalization methods. EDigest header field is also defined which allows to specify digest information for external content part or hash of several content parts joined together. Requirements Language Leibzon Expires January 11, 2006 [Page 1] Internet-Draft Content-Digest and EDigest July 2005 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Content-Digest Header Field . . . . . . . . . . . . . . . . . 4 2.1 Algorithm ("a") parameter . . . . . . . . . . . . . . . . 4 2.2 Host Information ("i") parameter . . . . . . . . . . . . . 5 2.3 Header Field List ("h") parameter . . . . . . . . . . . . 5 2.4 Canonicalization Method ("c") parameter . . . . . . . . . 6 2.5 Canonicalized Data Size ("s") parameter . . . . . . . . . 7 2.6 Time Stamp ID ("t") parameter . . . . . . . . . . . . . . 8 2.7 Hash Data ("d") parameter . . . . . . . . . . . . . . . . 8 3. Creation of Content-Digest Header Field . . . . . . . . . . . 10 3.1 Header Fields Processing . . . . . . . . . . . . . . . . . 10 3.2 Content Body Data Processing . . . . . . . . . . . . . . . 13 3.3 Digest Hash Creation . . . . . . . . . . . . . . . . . . . 16 4. Digest Hash Verification Procedure for Content-Digest . . . . 19 5. EDigest Header Field . . . . . . . . . . . . . . . . . . . . . 21 5.1 Content URL ("u") parameter . . . . . . . . . . . . . . . 21 5.2 Creation of EDigest Header Field . . . . . . . . . . . . . 23 5.3 Verification of EDigest Header Field . . . . . . . . . . . 24 6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.1 Simple Content-Digest as Replacement for Content-MD5 . . . 25 6.2 Content-Digest used in Email Message . . . . . . . . . . . 25 6.3 Content-Digest used in HTTP Transmission . . . . . . . . . 27 6.4 EDigest used in Email . . . . . . . . . . . . . . . . . . 27 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 8. Security Considerations . . . . . . . . . . . . . . . . . . . 31 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 9.1 Normative References . . . . . . . . . . . . . . . . . . . 32 9.2 Informative References . . . . . . . . . . . . . . . . . . 32 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 33 A. Collected Grammar . . . . . . . . . . . . . . . . . . . . . . 34 Intellectual Property and Copyright Statements . . . . . . . . 37 Leibzon Expires January 11, 2006 [Page 2] Internet-Draft Content-Digest and EDigest July 2005 1. Introduction For data transmission it is often desirable to be able to confirm integrity of the data and make certain entire data has been transmitted without modification and common method used for this is to calculate and send cryptographic hash digest of the data. With data transmissions involving MIME encapsulated data as used in SMTP, HTTP and other protocols this can be accomplished with Content-MD5 header field defined in [RFC1544]. However Content-MD5 header is tied to MD5 hash algorithm and recent research summarized in [draft-hoffman-hash-attacks-04] indicates that it is vulnerable to certain collision attacks and with fast computers the collision can be found in as fast as 4 hours. Additionally Content-MD5 hash creation involves only message data where as additional important information regarding message or MIME part can also be contained in its header and data integrity of header fields needs to be protected as well. This document defines Content-Digest header field which provides universal syntax for including hash of MIME or message data and optionally its header fields with support for several hash algorithms and canonicalization methods and other optional information regarding digest creation. Additionally EDigest header field is also defined which allows to include digest information for external MIME part or hash of several parts joined together. Leibzon Expires January 11, 2006 [Page 3] Internet-Draft Content-Digest and EDigest July 2005 2. Content-Digest Header Field Content-Digest is similar to Content-MD5 as it is a unique per-MIME part (or message data) field specifying hash digest for entire mime content and optionally including mime content header data. There can be only one Content-Digest header field in any mime or message header and it should be added by the originating user agent (this means it is added by MUA acting as MSA as far as email routing is concerned). Message relays and gateways are expressly forbidden from adding Content-MD5 and adding them to messages already in transit. Content-Digest header field has syntax similar to MIME fields and consists of multiple parameters with values separated by ";" (i.e. "param1=val1 ; param2 =val2"), please refer to Appendix A for the exact ABNF syntax. The start of the Content-Digest header field is always version information parameter, which looks like this: Content-Digest: v=1.0 ; ... This document describes use of the Content-Digest header field with version "1.0". Software that can understand "1.0" version of the Content-Digest header field SHOULD also attempt to interpret header fields that have the same major number "1" but different minor number (i.e. like "1.1"). Interpretation of Content-Digest header field with major number being anything other then "1" is not defined in this document and software that does not otherwise know how to interpret header field with different major number MUST NOT attempt to evaluate Content-Digest header field and message processing should be done as if no Content-Digest header field was present. After the version information header field contains number of additional parameters with parameter data ("d") being the only one that is required and all other parameters being either optional for processing or default value is assumed if the parameter is not present. Each parameter is further specified in separate subsection in this document: 2.1 Algorithm ("a") parameter An algorithm for digest hash computation is specified by means of "a" attribute. List of algorithms and possible values for "a" attribute is below: md5 - MD5 algorithm producing 128-bit hash as defined in [RFC1321] Leibzon Expires January 11, 2006 [Page 4] Internet-Draft Content-Digest and EDigest July 2005 sha1 - SHA1algorithm producing 160-bit hash as defined in [RFC3174] sha224 - SHA algorithm producing 224-bit hash as defined in [RFC3874] sha256 - SHA algorithm producing 256-bit hash as defined in [FIPS180-2] sha384 - SHA algorithm producing 384-bit hash as defined in [FIPS180-2] sha512 - SHA algorithm producing 512-bit hash as defined in [FIPS180-2] If algorithm is not specified and "a" attribute is absent in Content- Digest header field, then it should be assumed to be default SHA1 algorithm. An implementation confirming to this document MUST support SHA1 algorithm, SHOULD support MD5 and SHA224 algorithms and MAY support SHA256, SHA384 and SHA512 and other algorithms. New algorithms maybe introduced by other documents and do not require introduction of the new version number for Content-Digest. If algorithm specified in the "a" parameter of Content-Digest is unknown to the software evaluating the header, it MUST NOT attempt to evaluate Content-Digest header field and message processing should be done as if no header field was present. 2.2 Host Information ("i") parameter An optional "i" parameter maybe used to specify hostname of the system adding Content-Digest header field. This is purely information parameter and is not used in the processing and evaluation of the digest header. 2.3 Header Field List ("h") parameter MIME header fields that are included with content data for digest hash computation are listed by means of "h" parameter of Content- Digest. The field names are listed after the "h=" one by one separated by ",". Group of field that have common starting name can also be specified by using * ending, for example "h=Content-*" means all header field names that start with Content- such as Content-Type, Content-Transfer-Encoding, Content-ID and others. If '*' is not used then the name MUST match in full field name specified in the header up to ":", so "h=Content-Type" would match "Content-Type:" field name Leibzon Expires January 11, 2006 [Page 5] Internet-Draft Content-Digest and EDigest July 2005 header line but would not match "Content-Type-Extra:" header line. If all field names are to be included then this is specified as "h=*", but careful consideration must be given if that is desirable as in some cases new header fields are added to the message or specific mime parts while it is in transit. The field names are not case-sensitive and "h=Content-Type" means that even if the actual header field name in the MIME part is "CONTENT-TYPE" it would be a match and similarly for "content-type" or "cOnTeNt-TyPe" or any other variation of the same letters in different case. Note that MIME fields may match against the list in the "h" parameter are not relative to the Content-Digest header field position in the header and may appear both below and above it in the MIME and message header. Actual Content-Digest header field is never included as part of its own digest, even if Content-Digest name matches list of header fields in the "h=" (such as when "h=Content-*" or "h=*" are used). If there is no "h" parameter in Content-Digest header field then it means no header field data is included in the digest and digest is data is hash of content body only like it is with Content-MD5 header field. 2.4 Canonicalization Method ("c") parameter Canonicalization is a process of data transformation that makes format of the data acceptable based on constraints imposed by additional data processing functionality. In the case of digest computation this describes the process of transforming data into canonical form and actual hash computation is then done on the data in this canonical form. Canonicalization is most useful as way to insure that data hash can be verified even if some small data conversion is done when message is being transmitted. For example some intermediate message processing software interpret and correct what they consider to be header field problems such as case variations or too many white-space characters between header field name and value of the field; in other cases message processing software may remove trailing white-space characters on any line or first or last empty lines in the message. All such processing would normally result in not being able to verify hash of original message content, but some canonicalization methods can take this behavior into account and provide consistent format of data for digest verification. Note that doing canonicalization for digest computation does not mean that such canonicalized data is actually transmitted. Conversion and Leibzon Expires January 11, 2006 [Page 6] Internet-Draft Content-Digest and EDigest July 2005 data transformation rules for data transmission are in fact covered by content-transfer-encoding as specified in part 6 of [RFC2045]. As it relates to canonicalization and digest computation, content- transfer-encoding conversion should be done on original non- canonicalized data after the digest hash has been computed and appropriate Content-Digest header field added. And when digest is being verified, the canonicalization and digest computation are done after undoing any content-transfer-encoding. Similar to support provided for multiple cryptographic algorithms, Content-Digest provides supports using multiple canonicalization processing methods with small set of methods being required to be supported by all implementations. The canonicalization methods used for header field processing and for content body are also different and so "c" parameter value is composed of two separate parts - "a,b" where 'a' specifies method used for header field data canonicalization and 'b' specifies method used for body canonicalization. The canonicalization methods that MUST be supported are "bare", "simple", "nofws" for header fields processing and "bare", "text", "nofws", plus special values of "mimeform" and "none" for content body processing. If canonicalization method specified in the "c" parameter of Content-Digest is unknown to the software evaluating the header field, it MUST NOT attempt to evaluate Content-Digest header field and message processing should be done as if no Content-Digest header field was present. If there is no "c" parameter specified in the Content-Digest header field than it is assumed to be default "simple,mimeform" value. If value of the "c" parameter is one keyword like "c=nofws", than when doing canonicalization default "simple" method is to be used for header fields canonicalization and for body data the canonicalization method specified as value of "c" is to be used. More information about canonicalization methods and canonicalization process can be found in section 3 of this document. 2.5 Canonicalized Data Size ("s") parameter Number of bytes (octet count) in the canonicalized data (as used for computing hash digest) can optionally be included in the "s" parameter. This is primarily informational field and can be used during digest header verification as way to determine if content had been modified. If the number in "s" does not match the number of bytes of the canonicalized digest being verified then verifying system SHOULD abort the processing and can choose to report an extended error indicating that content has been changed and size does not match. Leibzon Expires January 11, 2006 [Page 7] Internet-Draft Content-Digest and EDigest July 2005 There may also be some situations where being able to verify majority of the data is sufficient. In such a case an application MAY try to use size parameter and after doing canonicalization if the result is larger then the original, cut the result to be exactly the number of bytes as specified in "s" and then attempt to do the verification of the digest. If it succeeds, such application should in some way report that only part of the content was successfully verified and may also optionally choose to discard the unverified part of the message content data. More information about how size parameter is used is found in section 4. 2.6 Time Stamp ID ("t") parameter Optional "t" parameter is used for providing time-stamp information on when digest hash was created. In EDigest this is also used as unique identifier (unlike Content-Digest, multiple EDigest header fields can exist in the same header). The value of the "t" parameter is data based on ISO8601 time format and consists of multiple digits of the form YYYYMMDDhhmmssxxxx where YYYY is 4-digit year, MM is 2-digit month, DD is 2-digit day, hh is 2-digit hour, mm is 2-digit minute, ss is 2-digit seconds, xxx are additional digits that maybe milliseconds or some other unique number identifying specific header field. The number may well be less then 18 digits (14 is a lot more common) and for example may only contain YYYMMDD. The time and date data used should be UTC with no locale information. Some examples of "t" parameter as follows: t=20050704142754 - corresponds to RFC2822 date "Mon, 4 Jul 2005 14:27:54 -0500" t=20050503 - corresponds to May 3, 2005 Note that the number specified in the "t" parameter is informational only and should not be assumed to always be a time-stamp or automatically interpreted as such by the application; automatic use of this number should be limited to providing unique reference. However, that this number usually contains timestamp maybe of use for purposes of email debugging and forensics. 2.7 Hash Data ("d") parameter Data parameter contains the actual digest hash data. Hash is calculated using algorithm specified in 'a' parameter based on data from content header fields (those that match listing in 'f' parameter) and content body after applying appropriate canonicalization as specified in "c" parameter. Resulting hash data is converted into BASE64 encoding as specified in section 3 of Leibzon Expires January 11, 2006 [Page 8] Internet-Draft Content-Digest and EDigest July 2005 [RFC3548] with '=' pad symbol and placed after 'd='. If hash BASE64 hash data ends with '=' then the data MUST also be enclosed in double quotes, i.e. d="...=" Hash data can be broken into multiple lines as specified in [RFC2822] section 2.3.3 but its preferable that entire data parameter (starting with 'd=') stay on one line in the header. It is also preferable that data parameter be the last parameter of the Content-Digest header. Use and placement of data is illustrated in more details in the examples contained in section 6 of this document. Leibzon Expires January 11, 2006 [Page 9] Internet-Draft Content-Digest and EDigest July 2005 3. Creation of Content-Digest Header Field Content-Digest field is created by the originating user agent which starts transmission of the content and not by intermediate content retransmission system. For email the originating user agent is an MUA program or any other program acting as MSA and as such is the originating agent in SMTP transmission. For HTTP, the originating agent is an HTTP server that serves the content from its data storage where it has been placed by the user or generates it on the fly (CGI or similar), but not any kind of caching HTTP system which does not actually generates the content by only retransmits the content received from another web server. Other MIME transmission protocols can also use Content-Digest using similar criteria to above in deciding which system involved in transmission should be adding Content-Digest header field. The content transmission origination system (thereafter CTOS) that wants to add Content-Digest header field should proceed as follows: 3.1 Header Fields Processing First decision should be made on what data is to be used for digest hash, which is based on local preferences and on how digest hash is going to be used. Generally it is good idea to include only content- specific header fields such as Content-Type but not transmission header fields such as Connection in HTTP. This is because while content-specific fields are not something that should change during transmission, the other header fields may change if content is retransmitted (such as by forwarding or other redirection system in email or by caching proxy server in HTTP). Content-Transfer-Encoding header field (which provides information on data transmission encoding) is thus something should be included in the list of header fields only if change of transit encoding by intermediate systems is not allowed (which is not always true) After list of header fields which are to be part of digest hash data is ready, entire "h" attribute can be created. Consideration should be given as to if use of "*" is appropriate to combine several fields together because if new fields with same prefix are added by intermediate retransmission systems this would cause failure in digest verification (as such "h=*" should generally not be used unless message transmission is point-point and no retransmission systems are expected or allowed and use of "h=Content-*" is possible only if Content-Transfer-Encoding is not specified or is not expected to change). Next canonicalization should be applied to the header fields data. There are 3 header canonicalization processing methods defined by Leibzon Expires January 11, 2006 [Page 10] Internet-Draft Content-Digest and EDigest July 2005 this document - 'bare', 'simple' and 'nofws' and to show how they differ an example will be helpful so it is assumed that as at the beginning the content data header was: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-ID: <218F64C460.u314@example.com> Content-Transfer-Encoding: 7bit Content-Description: Collection Footer And lets assume that for this example, the header fields to be included are all of the above except Content-Transfer-Encoding and so this is described with attribute "h=Content-Type,Content-ID,Content- Description,MIME-Version". Now the following is how canonical data form is calculated depending on which canonicalization method is used (for each method it is assumed that we start with empty canonical header form buffer): BARE - In this canonicalization method header data is largely used as is. The algorithm is: for each header field name listed in "h" in the order the fields are listed, try to find one or more instances of the matching field (full name exactly the same as listed up to ":" or if * is used, then field name up to * is the same) and then entire header field line as is starting with field name itself and up (but not including) the first letter of the next header field in the header (including end of line characters) are added into canonical data form buffer. For the example above, the result of applying this method is canonical buffer data as follows: Content-Type: text/plain; charset="us-ascii" Content-ID: <218F64C460.u314@example.com> Content-Description: Collection Footer MIME-Version: 1.0 SIMPLE - In this canonicalization method, common problems that are encountered with transformation of the header fields are accounted for and data is made to be consistent with what defined in ABNF header field syntax in [RFC2822] except 8-bit data is not touched (by RFC2822 there should not be any 8-bit data in the message and mime header but unfortunately it does happen). The system for choosing header fields and their order is the same as with 'bare' but header fields data is not copied as-is to the canonical data form buffer, instead the following is done for each header field: Leibzon Expires January 11, 2006 [Page 11] Internet-Draft Content-Digest and EDigest July 2005 1. If header field consists of multiple lines, the lines are unfolded (procedure described in section 2.3.3 of [RFC2822] and involves removal of CRLF pair) to become one long field line. If they are any single line break characters CR or LF they are also to be removed as well as any NULL (ASCII code 0) characters. In above example the only header field that is effected is "Content-Type" which consists of data in two lines. 2. All multiple concurrent white-space characters (white-space is WSP as defined in [RFC2822] section 2.2.2 and includes SP and HTAB) are eliminated from the header field. In above example this would effect double white space after "Content-Type:" and double white space between "Collection" and "Footer". 3. The header field name itself is made to be entirely lowercase. That means in header field name (start of header field line to first ":") for each octet character with ASCII code 'a" between 65 and 90 the character is replaced with character with ASCII code a+32. 4. If there is a sequence of one or more WSP at the end, it is removed. 5. A new CRLF character is added to the end of the newly converted header field line. The result of applying this method to the example given above is canonical data block: content-type: text/plain; charset="us-ascii" content-id: <218F64C460.u314@example.com> content-description: Collection Footer mime-version: 1.0 NOFWS - In this canonicalization method (which name is abbreviation for "No Free White Space"), only the alpha numeric characters of data are used for digest. While that means that the core of the content text is preserved and verified, but there maybe some problems with this system as all spaces between words are lost. This canonicalization method uses similar algorithm to 'simple' with the following steps for data transformation: 1. If header field consists of multiple lines, the lines are unfolded (procedure described in section 2.3.3 of [RFC2822] and involves removal of CRLF pair) to become one long field line. If they are any single line break characters CR or LF they are also to be removed as well as any NULL (ASCII code 0) Leibzon Expires January 11, 2006 [Page 12] Internet-Draft Content-Digest and EDigest July 2005 characters. In above example the only header field that is effected is "Content-Type" which consists of data in two lines. 2. All octet characters with ASCII value less then 33 and more then 126 are removed from header data. 3. The header field name itself is made to be entirely lowercase. That means in header field name (start of header field line to first ":") for each octet character with ASCII code 'a" between 65 and 90 the character is replaced with character with ASCII code a+32. The result of applying this method to example above is canonical data block as follows (note that " \" is used to indicate line break for purposes of this document only since the result of using above canonicalization method is one long line in without breaks): content-type:text/plain;charset="us-ascii"content-id: \ <218F64C460.u314@example.com>content-description:Coll \ ectionFootermime-version:1.0 It is RECOMMENDED that default 'simple' canonicalization method be used when content data is being transmitted to unknown recipient across the Internet. This canonicalization method can deal with common header data transformation by intermediate systems and does not cause loss of content data. If it is important to make certain data is received exactly the same as it was transmitted with no modifications or reformatting of any kind, then 'bare' canonicalization can be used but this should normally be reserved only for known and pre-arranged data path transmission where it is known to be safe. For cases when data transmission goes through series of relays and it has been noticed that digest hash does not verify as a result, then using 'nofws' can be considered but it should be noted that it only provides verification of the text symbols and is not secure enough for full data integrity protection. 3.2 Content Body Data Processing Content body processing for digest hash creation may also involve data transformation to canonical format depending on chosen canonicalization method. There are 4 body data canonicalization processing methods defined by this document - 'bare', 'text', 'mimeform', 'nofws' and 'none' and, as with to header fields canonicalization, to show how they differ a simple example of data before canonicalization is used (note that "\cr and \lf in the example represent CR and LF characters): Leibzon Expires January 11, 2006 [Page 13] Internet-Draft Content-Digest and EDigest July 2005 \cr\lf Happy 4th of July,\cr \cr Fireworks at pier 39 at 9:30pm, be there. \cr \cr Will Now the following is how canonical body data is calculated depending on which canonicalization method is used (for each method it is assumed that we start with empty canonical body buffer): BARE - In this canonicalization method body data is unchanged and used 100% as-is. TEXT - In this canonicalization method, some problems that are encountered with transmission of text data are dealt with and its made certain that data is consistent with canonical form of MIME text/plain content-type object as described in section 4 of [RFC2049] and with text message data format as described in section 2 of [RFC2822]. This is done as follows: 1. All NULL (ASCII code 0) characters are removed and any single CR or single LF character is replaced with CRLF pair (if CR is already followed by LF, then neither is changed). 2. The data is examined to make certain that all lines are no longer then 998 octet characters long (a line is defined as continues stream of characters terminated by CRLF and starting with either beginning of the data or with first character after previous CRLF). If any line is longer then 998 characters then after 998's character a CRLF pair is inserted and the procedure described in this step is repeated. 3. Any sequence of one or more white-space characters (white- space is WSP as defined in [RFC2822] section 2.2.2 and includes SP and HTAB) that are immediately followed CRLF is removed. 4. If there is a sequence of one or more CRLF pairs at the start of the data content (as left following after 3) it is removed. The result of applying this method to example above is canonical data block as follows: Leibzon Expires January 11, 2006 [Page 14] Internet-Draft Content-Digest and EDigest July 2005 Happy 4th of July,\cr\lf \cr\lf Fireworks at pier 39 at 9:30pm, be there.\cr\lf \cr\lf Will NOFWS - This is another canonicalization primarily for text data and in this canonicalization method only the alpha numeric characters of data are left (this is less secure as far as data integrity but the core information of the context text is still protected). The canonicalization method is actually fairly simple and consists of one step as follows: * All NULL (ASCII code 0), CR (ASCII code 13), LF (ASCII code 10), HTAB (ASCII code 9), VTAB (ASCII code 11), FF (ASCII code 12) and SP (ASCII code 32) characters are removed. The result of applying this method to example above is canonical data block consisting of one line as follows: Happy4thofJuly,Fireworksatpier39at9:30pm,bethere.Will MIMEFORM - This is a special canonicalization method which is meant to have data converted into MIME canonical form. As described in section 4 of [RFC2049] MIME canonical form depends on the type of data which is based on Content-Type and as far as this canonicalization method, if data is text media type (based on Content-Type: text/????) then TEXT canonicalization method is used. For all other media types, BARE canonicalization method is used. This is default canonicalization method for content data. NONE - A special canonicalization value of "none" allows to specify that body data is not part of digest hash (i.e. the canonicalization process uses none of the data). This is used with EDigest (extended form of Content-Digest header field which is discussed further in section 5 of this document) when it is desirable to create digest hash only for a group of specific header fields. For text content data the 'text' canonicalization is fairly flexible to take care of common alterations with no security risks and If it has been noticed that data transmission is likely to involve relays that do such modifications that digest hash would no longer verifying as a result, then using 'nofws' can also be considered. In cases where it is very important to make certain data is received exactly the same as it was transmitted with no modifications or reformatting of any kind, 'bare' canonicalization can be used but this should normally be reserved only for known and pre-arranged data path Leibzon Expires January 11, 2006 [Page 15] Internet-Draft Content-Digest and EDigest July 2005 transmission where it is known to be safe. If the content data is not text and is not going to be transmitted as text (i.e. with 7bit or quoted-printable content-transfer-encoding), then it is very unlikely to be touched by any intermediate system and using 'bare' canonicalization method is appropriate. Based on above the CTOS should make certain to use appropriate canonicalization. It is important to understand that default 'mimeform' depends on the Content-Type header field value and defaults to 'text' for any text media type MIME content and to "bare" otherwise and this works well for most cases. There may also be number of other content-types which are not specifically identified as text media type but that use text data and in those cases CTOS should specifically select 'text' canonicalization method and specify that in "c" parameter. Notice that Multipart and Message complex mime types are also very often composed only of text components and in such cases using 'text' canonicalization may also be appropriate and will need to be specified in the "c" parameter. 3.3 Digest Hash Creation After processing of content header and body as described above, the result would be two data buffers with results of containing canonicalized form of header fields and canonicalized data body. For the actual data used for digest hash creation, these are joined together with canonical header fields data going first and then canonicalized data body being added to that. Note that even if 'bare' body canonicalization is used for both header fields data and content body processing, the result of there canonical form being joined would not be the same as original MIME content part as it would be missing line separating content header and body. So for example if the original MIME content was: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-ID: <218F64C460.u314@example.com> Content-Transfer-Encoding: 7bit Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will The result of using canonicalization and header processing described by "c=bare,bare; h=content-type,content-id,mime-version" would be the following data (with octet count 183) ready for hash creation: Leibzon Expires January 11, 2006 [Page 16] Internet-Draft Content-Digest and EDigest July 2005 Content-Type: text/plain; charset="us-ascii" Content-ID: <218F64C460.u314@example.com> MIME-Version: 1.0 Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will Where as result of using default "simple,mimeform" canonicalization with same list of header fields (described by "h=content- type,content-id,mime-version; c=simple,mimeform;" or just "h=content- type,content-id,mime-version") would be: content-type: text/plain; charset="us-ascii" content-id: <218F64C460.u314@example.com> mime-version: 1.0 Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will The octet byte count of above data is 177 (the size calculation includes newline CRLF symbols which thus increase add 2 bytes for every line, even the empty ones). Calculation of the octet size of canonicalized data is optional and done in order to be used as value of "s" parameter of Content-Digest. Once processing of content is complete and data is ready, the cryptographic hash algorithm is applied to that data. The choice of the hash algorithm should be made based on system policies and security considerations in regards to the transmission. Default SHA1 algorithm is a good choice and offers sufficient security for most cases and it is NOT RECOMMENDED that anything less secure that does not result in at least 160-bit hash be used. The actual hash creation is described other documents, please see [RFC1321] on how to create MD5 hash, [RFC3174] for SHA1 hash and [FIPS180-2] for other versions of SHA1 algorithm that produce more then 160-bit hash data. The result of the entire of the process as described in this section would be the following Content-Digest header field: Content-Digest: v=1.0; h=content-type,content-id,mime-version; c=simple,mimeform; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k=" Note that since "a=sha1" and "c=simple,mimeform" are default, the above can shortened to: Leibzon Expires January 11, 2006 [Page 17] Internet-Draft Content-Digest and EDigest July 2005 Content-Digest: v=1.0; h=content-type,content-id,mime-version; d="0ZOMSM79tU+ujUVmjaOkRBmad8k=" Now Content-Digest header is ready to be added into content part. It is best to add Content-Digest below other MIME or message header fields (some of which would have been part of data that went into digest hash), but Content-Digest could be added into other parts of the header as well. The above example, after Content-Digest is added becomes: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-ID: <218F64C460.u314@example.com> Content-Transfer-Encoding: 7bit Content-Digest: v=1.0; h=content-type,content-id,mime-version; c=simple,mimeform; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k=" Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will HTTP also has concept of "trailer" (which is data after content body and consists of same types of fields as header), Content-Digest can be placed in the trailer if desired, but note that list of fields in "h" only represents fields in header and NOT in the trailer. Leibzon Expires January 11, 2006 [Page 18] Internet-Draft Content-Digest and EDigest July 2005 4. Digest Hash Verification Procedure for Content-Digest Verification procedure for Content-Digest header follows largely the same procedures as for creation of the header field. This is done as follows: 1. First for content part being verified, the header is taken and canonical version of that is produced following procedure outlined in section 3.1 and based on header fields list found in "h" parameter and header canonicalization method listed in "c" parameter (note that canonical version must not include the actual Content-Digest header field even if it would match based on list in "h"). If there was no "h" parameter in the Content- Digest header, then the result of this step is empty string. 2. Next canonical version of content body data is produced as described in section 3.2 based on "c" parameter. This is added to the data produced as a result of step 1 3. If parameter "s" is present in Content-Digest header field, then the octet size of the data from step 2 is calculated. If this size does not match value in parameter "s", then verifying system has the following options: 1. Abort further processing and return an error indicating that Content-Digest can not be verified and content has been changed. 2. Cut down the number of bytes from the end of the data from step 2 to so it matches the number of bytes in "s" before proceeding to step 4. 4. Cryptographic hash is produced using algorithm listed in parameter 'a' (sha1 if 'a" is not present) based on data from step 2 (or step 3.2). This cryptographic hash data is compared against data in "d" parameter of Content-Digest. If they match, then result of verification is success, otherwise its a failure. In regards to removing data from canonicalized content as indicated in step 3.2 to match size parameter in Content-Digest header field, this is something that should be done only in specific context where it is believed that an intermediate system may exist that has added extra data to the end of content during transmission. This happened to be the case of email message that came through mail list (which often add their own footer to the message) and so dropping the end of the email message would allow to verify the original version. However, one must be aware of the dangers of doing so as it means only part of the message data is verified and this is a serious Leibzon Expires January 11, 2006 [Page 19] Internet-Draft Content-Digest and EDigest July 2005 security issue that can be exploited. It is therefore best that if the verifying system chooses to verify only part of the content, that it consider changing the entire message to only include part that has been verified (optionally this may involve not removing the unverified content part, but instead moving it into separate attachment content data). It should be noted that since mail lists add their footer to text messages, this method should not be attempted if data content is of type other then text and for binary data the Content-Digest verification should simply be considered to have failed (as in step 3.1 above) if size of canonicalized content does not match value of "s" parameter of Content-Digest. Leibzon Expires January 11, 2006 [Page 20] Internet-Draft Content-Digest and EDigest July 2005 5. EDigest Header Field EDigest header field is very similar to Content-Digest (it can be considered an extended form of Content-Digest) and also includes hash digest data of the content, but unlike Content-Digest, it does not have to be unique field for particular MIME part or attached to it. As such the EDigest header field provides the following additional functionality over Content-Digest: 1. It can be added by intermediate transport agents (including message relays and gateways) and not only at the transmission origin. 2. It can be in used in parts other then content header itself and as such allows digest reference for MIME subpart of the message and for externally located MIME part 3. It can provide digest hash that can be used to verify data of several MIME parts together. The syntax of Edigest field (for full syntax please refer to Appendix A) is identical to Content-Digest and consists of all the same parameters plus one more optional parameter "u". 5.1 Content URL ("u") parameter The value of EDigest header field "u" parameter is URL data pointing to the content which hash the digest header field is for. This URL data is list of one or more URLs with each URL enclosed in "<" and ">" and separated by FWS - this is very similar to how URLs are specified in References header field in email header. Common use of URL parameter is when EDigest header field specifies hash of MIME entity which is enclosed within another MIME entity or message and its desirable to provide hash of the content directly in this parent entity. In such a case "cid" (Content-ID as specified in [RFC2392]) is used and specifies reference to unique id of the content as is found in its Content-ID header field. An example of such use is as follows: Leibzon Expires January 11, 2006 [Page 21] Internet-Draft Content-Digest and EDigest July 2005 Edigest: v=1.0; u=""; h=content-type,message-id,mime-version; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k=" From: will@example.com To: mary@example.net Subject: Fireworks Date: Mon, 4 Jul 2005 12:34:26 -0400 Message-ID: Mime-Version: 1.1 Content-Type: Multipart/Mixed; Boundary="NextPart" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --NextPart Content-Type: text/plain; charset="US-ASCII" MIME-Version: 1.0 Content-ID: <218C460.u314@example.com> Content-Transfer-Encoding: 7bit Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will URL scheme "Cid" should be considered to default URL scheme, so entering "cid:" is optional and parameter 'u=' can also be expressed simply as 'u=<218C460.u314@example.com>'. Value of "u" parameter needs to be a reference to unique content part so no two content parts in the message can have the same Content-ID even if those are subparts of "Multipart/Alternative" (in [RFC2392] it is specified that in such a case content parts may have common content-id for reference) for digest to be used with those parts. For those cases where having common content-id for referencing to one of the multiple parts within "Multipart/Alternative" is necessary for an application, such common reference id should be to Content-ID header field for actual Multipart/Alternative MIME part rather then Content-ID of its subpart. With digest "u" parameter it is also possible to specify more then one content part, for example: Edigest: v=1.0; h=content-type,content-id,mime-version; u="<218C460.u314@example.com> <218C460.u315@example.com>"; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k=" Leibzon Expires January 11, 2006 [Page 22] Internet-Draft Content-Digest and EDigest July 2005 specifies that hash data in "d" is based on content data in both <218C460.u314@example.com> and <<218C460.u315@example.com>. This is used as replacement for having multiple digest header fields added for individual content parts when all these content parts are related and are not expected change individually for message during transport. In such situations a hash for entire message could be an option, but such hash would not verify if content parts are rearranged or new content part is added to the message during transport, where as hash data in EDigest header field with multiple content parts listed in "u" would not be effected if new content is added to the message or if existing parts are in any way rearranged. Other uses of this could involve multiple external data components (such as data content parts available on web server) which are referenced from content parts in the message and which client is expected to have downloaded as part of message verification and presentation to the user. 5.2 Creation of EDigest Header Field EDigest header field is created in a similar way as Content-Digest, the differences are present only when the "u" parameter is to be used. When EDigest is created referencing single content part in the message, the same procedures as described section 3 are followed, except that EDigest header field is not placed in the same content part. Content-ID header field must be present in the header of content-part in the message that is referenced in "u" parameter with content-id URL. It is also possible to reference stationary remote content located on http, ftp or some other service. If such content is MIME, then "h" parameter MUST be present and include at least one header field (such as Content-Type). If remote content is not MIME then it is considered binary (even if it is only text) and both header and body canonicalization is to be set to 'bare' (i.e. "c=bare,bare") and 'h" parameter MUST NOT be present. When multiple URLs are listed as 'u' parameter value, then the procedure to produce hash is as follows: 1. Follow procedure described in section 3.1for header of the 1st content referenced in 'u' parameter. This will result in buffer with canonicalized header fields data. 2. Follow procedure described in section 3.2 for content body data of the 1st content referenced in 'u' parameter and the result (canonicalized body content data) is added to data from step 1. Leibzon Expires January 11, 2006 [Page 23] Internet-Draft Content-Digest and EDigest July 2005 3. Follow procedure described in section 3.1for header of the second 2nd referenced in 'u' parameter. Add it to the end of result from step 2. 4. Follow procedure described in section 3.2 content body data of the 2nd content referenced in 'u' parameter. Add it to the end of result from step 3. 5. ... 6. Follow procedure described in section 3.2 content body data of the last content referenced in 'u' parameter. Add it to the end of result from previous steps. Hash algorithm is then used on the data from last step (the result of adding canonicalized data from all content parts) and the result goes into "d" parameter. Similarly "s" parameter is optionally added and is octet count of all canonicalized data. Note that with multiple content parts in "u", the same list of header fields from "h" parameter is used and as such this list may have to include names of header fields that are present in one content part but not in another one in order to produce appropriate hash that includes all necessary data. 5.3 Verification of EDigest Header Field Verification of EDigest header field is done only if data for all content parts referenced in "u" are available to verifying agent. If that is not so, verification should be aborted with error message indicating that some of the referenced data is not available. If the data that is not available due to temporary dns error resolving domain name from one of the URLs in "u" parameter, then verifying agent may choose to delay verification and attempt it again at later time. The procedure for verification of EDigest header field is the same as described in section 5.3 to produce the hash which is then compared to hash in "d" parameter. If they match, then EDigest is successfully verified, if they do not, then verification has failed. Leibzon Expires January 11, 2006 [Page 24] Internet-Draft Content-Digest and EDigest July 2005 6. Examples 6.1 Simple Content-Digest as Replacement for Content-MD5 In simple form without using "h" parameter, Contest-Digest header field can easily be used as replacement for Content-MD5 and, as majority of Content-Digest field parameters are optional or have default values, this does not require much more space: For the following small text content with Content-MD5 field: Content-Type: text/plain; format=flowed Content-MD5: vP5T2agfLQOCooDQF3lghA== Test Message The replacement of Content-MD5 with Content-Digest with md5 algorithm would be as follows (notice that hash data is the same): Content-Type: text/plain; format=flowed Content-Digest: v=1.0; a=md5; d="vP5T2agfLQOCooDQF3lghA==" Test Message When MD5 algorithm is replaced with more secure SHA1 (default when "a" is not present), the data would look as follows: Content-Type: text/plain; format=flowed Content-Digest: v=1.0; d="yH0loJWEwEDzv8U7VwGZWR3rELo=" Test Message 6.2 Content-Digest used in Email Message The use of Content-Digest for email message which consists is composed entirely of one content part is shown in section 3.3. Here this is expanded to show example of MIME multi-part email message with use of Content-Digest header both for particular email content parts and for entire email message. In all cases the default "sha1" algorithm is used (and "a" parameter is not specifically added). Leibzon Expires January 11, 2006 [Page 25] Internet-Draft Content-Digest and EDigest July 2005 From: will@example.com To: mary@example.net Subject: Fireworks Date: Mon, 4 Jul 2005 12:34:26 -0400 Message-ID: Mime-Version: 1.1 Content-Type: MULTIPART/signed; Boundary="NextPart" protocol="application/pkcs7-signature"; micalg=sha1 Content-Transfer-Encoding: 7bit Content-Digest: v=1.0; i=mail.example.com; t=2005070412342601; h=content-type,mime-version,message-id,date; c=nofws; d="rNqZDKbZ4eFzs/6Z67ivfIA2JPs=" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --NextPart Content-Type: text/plain; charset="US-ASCII" MIME-Version: 1.0 Content-ID: <218C460.u314@example.com> Content-Digest: v=1.0; h=content-type,content-id,mime-version; a=sha1; d="MSU3X80gRiNX1r2sjRzV4thQ5cs=" Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will --NextPart Content-Type: APPLICATION/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: BASE64 Content-Description: S/MIME Cryptographic Signature Content-Disposition: attachment; filename="smime.p7s" Content-ID: <218C460.u315@example.com> Content-Digest: v=1.0; h=content-*; t=2005070412341200; d="HlT99tyN/wczesmLuavpsr5qXbc=" MIIEWgYJKoZIhvcNAQcCoIIESzCCBEcCAQExCzAJBgUrDgMCGgUAMAsGCSqG SIb3DQEHAaCCAl8wggJbMIIBxKADAgECAgMMcrUwDQYJKoZIhvcNAQEEBQAw .... Note that Content-Transfer-Encoding header field is included in the digest hash data for last content part. While this may not be good for text data, BASE64 is well known for being the only transfer- encoding S/MIME signature and is not likely to ever be changed by intermediate transmission systems. The actual canonicalized data ('bare' canonicalization is assumed by default since not specified) that goes into hash digest computation IS NOT BASE64, but binary Leibzon Expires January 11, 2006 [Page 26] Internet-Draft Content-Digest and EDigest July 2005 8-bit data (since digest data is added based on original data before applying of content-transfer-encoding rules). However the data used for hash computation of Content-Digest in the mail message header itself (identified by t=2005070412342601) would be based on encapsulated and encoded MIME parts within it with content-transfer- encoding applied and so in that case BASE64 encoded data is used (and mail message content hash also includes data from header fields of all message parts, including Content-Digest field with t=2005070412341200). 6.3 Content-Digest used in HTTP Transmission Content-Digest can be used as replacement for Content-MD5 for HTTP and is used in the same way and only when entire content part data is transmitted. Here is an example: Date: Sun, 10 Jul 2005 15:02:03 GMT Accept-Ranges: bytes ETag: "8088c-13bfe-42d137fd-windows-1251" Server: Apache/1.3.22 (Unix) mod_deflate/1.0.21 mod_accel/1.0.31 Vary: accept-charset, user-agent Content-Length: 80894 Content-Type: text/html; charset=windows-1251 Content-Digest: v=1.0; i=www.example.com; h=Content-Type,Last-Modified,ETag; c=bare; c=bare; d="MpUuKLUmoKUapc4q2kMyw3XzEUo=" Last-Modified: Sun, 10 Jul 2005 15:00:13 GMT Hello World

Hello World

In cases when partial content data is transmitted (transmission in chunks) an HTTP instant digest maybe used for data integrity - please see [RFC3230] regarding this complimentary concept of digest header field specific to each connection. To be able to verify entire data (rather then specific chunk), EDigest with "u" parameter pointing to permanent location of the data can be included in the header of each chunk with Content-Location header field also present in the same header. 6.4 EDigest used in Email Below is shown an example from 3.2, but with EDigest (with t=2005070510302601) being used in email header to provide hash of particular mime parts rather then entire message as a whole (as it was with Content-Digest in example 3.2). The message after being delivered is then manually resent to listserver which adds additional mime part (mail list footer) and then mail list server ads new Leibzon Expires January 11, 2006 [Page 27] Internet-Draft Content-Digest and EDigest July 2005 EDigest field (with t=2005070413063001). Note that in email EDigest header fields are typically prepended to the message as trace data, which is different then Content-Digest fields that are added together with other Content fields by message originator and usually appear below them in content header. EDigest: v=1.0; i=lserv.example.org; t=2005070413063001; u="<218C460.u314@example.com> <218C460.u315@example.com> "; h="content-type,mime-version,content-id,content-digest, content-originator"; d="MJkDZynIX7LCZ8LBO/KB2UGQmU0=" Received: from box.example.net (box.example.net [10.0.2.10]) by lserv.example.org (8.12.1/8.12.1) with ESMTP id 4d343d31 for ; Mon, 04 July 2005 13:06:20 Resent-From: mary@example.net Resent-To: family-list@example.org Resent-Date: Mon, 4 Jul 2005 13:04:10 -0400 Received: from mail.example.com (mail.example.com [10.0.0.1]) by box.example.net (8.12.1/8.12.1) with ESMTP id nmonpqrst1 for ; Mon, 04 July 2005 10:33:04 +0100 EDigest: v=1.0; i=mail.example.com; t=2005070510302601; u="<218C460.u314@example.com> <218C460.u315@example.com>"; h=content-type,mime-version,content-id,content-digest; d="COb/tgPpFD4JNS2vYelZAkk4aHU=" From: will@example.com To: mary@example.net Subject: Fireworks Date: Mon, 4 Jul 2005 10:29:15 -0400 Message-ID: Mime-Version: 1.1 Content-Type: MULTIPART/mixed; Boundary="NextPart" Content-Transfer-Encoding: 7bit This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --NextPart Content-Type: text/plain; charset="US-ASCII" MIME-Version: 1.0 Content-ID: <218C460.u314@example.com> Content-Digest: v=1.0; h=content-type,content-id,mime-version; a=sha1; d="MSU3X80gRiNX1r2sjRzV4thQ5cs=" Happy 4th of July, Fireworks at pier 39 at 9:30pm, be there. Will Leibzon Expires January 11, 2006 [Page 28] Internet-Draft Content-Digest and EDigest July 2005 --NextPart Content-Type: APPLICATION/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: BASE64 Content-Description: S/MIME Cryptographic Signature Content-Disposition: attachment; filename="smime.p7s" Content-ID: <218C460.u315@example.com> Content-Digest: v=1.0; h=content-*; t=2005070412341200; d="HlT99tyN/wczesmLuavpsr5qXbc=" MIIEWgYJKoZIhvcNAQcCoIIESzCCBEcCAQExCzAJBgUrDgMCGgUAMAsGCSqG SIb3DQEHAaCCAl8wggJbMIIBxKADAgECAgMMcrUwDQYJKoZIhvcNAQEEBQAw .... --NextPart Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Originator: "Family List" Content-ID: Content-Digest: v=1.0 h=content-*; d="c4ZKJPGIqDAfn/SrjbF8jI5448k=" _______________________________________________ private family mailing list - family-list@example.org Leibzon Expires January 11, 2006 [Page 29] Internet-Draft Content-Digest and EDigest July 2005 7. IANA Considerations Two header fields are to be registered as follows: --------------------------------------------------------------------- Header field name: Content-Digest Applicable protocol: MIME Status: provisional Author/Change controller: William Leibzon Specification document(s): This document Related information: none --------------------------------------------------------------------- --------------------------------------------------------------------- Header field name: EDigest Applicable protocol: MIME, mail Status: provisional Author/Change controller: William Leibzon Specification document(s): This document Related information: none --------------------------------------------------------------------- Note to RFC Editor: this section may be removed on publication as an RFC Leibzon Expires January 11, 2006 [Page 30] Internet-Draft Content-Digest and EDigest July 2005 8. Security Considerations This document specifies a data integrity mechanism to protects MIME data (including MIME header) from accidental modification while in transit from origin to destination. Data integrity with Content- Digest and Edigest is not a replacement for end-end messaging security architecture such as S/MIME [RFC3851]or PGP [RFC3156] but may supplement them. Addition of EDigest in automated way by message transport agents maybe used as basis for building automated email signing system. Leibzon Expires January 11, 2006 [Page 31] Internet-Draft Content-Digest and EDigest July 2005 9. References 9.1 Normative References [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2049] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples", RFC 2049, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource Locators", RFC 2392, August 1998. [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001. [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001. [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003. [RFC3874] Housley, R., "A 224-bit One-way Hash Function: SHA-224", RFC 3874, September 2004. 9.2 Informative References [FIPS180-2] "US Federal Information Processing Standards Publication 180-2", August 2002, . [RFC1421] Linn, J., "Privacy Enhancement for Internet Electronic Mail: Part I: Message Encryption and Authentication Procedures", RFC 1421, February 1993. [RFC1544] Rose, M., "The Content-MD5 Header Field", RFC 1544, November 1993. [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Leibzon Expires January 11, 2006 [Page 32] Internet-Draft Content-Digest and EDigest July 2005 Resource Locators (URL)", RFC 1738, December 1994. [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, "MIME Security with OpenPGP", RFC 3156, August 2001. [RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", RFC 3230, January 2002. [RFC3851] Ramsdell, B., "Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.1 Message Specification", RFC 3851, July 2004. [draft-hoffman-hash-attacks-04] Hoffman, P., "Attacks on Cryptographic Hashes in Internet Protocols", June 2005, . Author's Address William Leibzon Elan Networks 500 Laurelwood Rd, Suite 12 Santa Clara, California 95054 USA Email: william@elan.net Leibzon Expires January 11, 2006 [Page 33] Internet-Draft Content-Digest and EDigest July 2005 Appendix A. Collected Grammar This appendix contains the complete ABNF grammar for the Content- Digest and EDigest header fields. For any grammar terms that are not specifically defined below (such as CFWS and FWS), please refer to the [RFC2822] document and its ABNF grammar definitions. The ABNF grammar of Content-Digest header field is as follows: Content-Digest = "Content-Digest" ":" FWS version parameters version = "v=" version-number CFWS ";" version-number = "1.0" / unknown-version unknown-version = number-major "." number-minor number-major = 1*(digit) number-minor = 1*(digit) parameters = *(CFWS ";" FWS parameter) CFWS data-parameter *(CFWS ";" FWS parameter) data-parameter = ";" FWS "d=" value parameter = algorithm / headerfieldlist / canonicalization / size / timestamp / hostinfo / undefined-parameter ; Matching of parameter names is case-insensitive undefined-parameter = undefined-name "=" undefined-value undefined-name = token undefined-value = value algorithm = "a=" algorithm-name algorithm-name = "md5" / "sha1" / "sha224" / "sha256" / "sha384" / "sha512" / undefined-value ; Matching of algorithm names is case-insensitive canonicalization = "c=" [header-canonicalization ","] body-canonicalization ; Matching of header and body canonicalization is case-insensitive header-canonicalization = "bare" / "simple" / "nofws" / undefined-value Leibzon Expires January 11, 2006 [Page 34] Internet-Draft Content-Digest and EDigest July 2005 body-canonicalization = "bare" / "text" / "mimeform" / "nofws" / "none" / undefined-value size = "l=" 1*(digit) timestamp = "t=" timestamp-value timestamp-value = 1*(digit) ["." 1*(digit)] hostinfo = "i=" value headerfieldlist = "h=" headerfield *("," headerfield) headerfield = field-name ; Matching of header field names is case-insensitive field-name = 1*ftext [ "*" ] ftext = %d33-57 / ; Any character except %d59-126 ; controls, SP, and ":" digit = %d48-57 ; Numeric Digit value = token / quoted-string token = 1* tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> /"/" / "[" / "]" / "?" / "=" ; Must be in quoted-string to use within parameter values The ABNF grammar of EDigest header field is as follows (anything not defined here, please refer to Content-Digest grammar): Leibzon Expires January 11, 2006 [Page 35] Internet-Draft Content-Digest and EDigest July 2005 EDigest = "EDigest" ":" FWS version edigest-parameters edigest-parameters = *(CFWS ";" FWS ed-parameter) CFWS data-parameter *(CFWS ";" FWS ed-parameter) ed-parameter = algorithm / headerfieldlist / canonicalization / size / timestamp / hostinfo / urlinfo / undefined-parameter ; Matching of parameter names is case-insensitive urlinfo = "u=" quoted-url / content-id ; content-id is as defined in RFC2392 quoted-url = %d34 urldata $d34 ; quoted-url must be used if urldata contains tspecials characters urldata = oneurl 0*(FWS oneurl) oneurl = "<" value ">" ; value above is expected to be genericurl as defined in RFC1738 syntax ; but may also be content-id as defined in RFC2392 Leibzon Expires January 11, 2006 [Page 36] Internet-Draft Content-Digest and EDigest July 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Leibzon Expires January 11, 2006 [Page 37]