Network Working Group                                         W. Leibzon
Internet-Draft                                             Elan Networks
Expires: January 11, 2006                                  July 10, 2005


                Content-Digest and EDigest Header Fields
                draft-leibzon-content-digest-edigest-00

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 11, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document defines Content-Digest header field, which can be used
   for including hash of MIME content body and header fields data and
   can support several hash algorithms and canonicalization methods.
   EDigest header field is also defined which allows to specify digest
   information for external content part or hash of several content
   parts joined together.

Requirements Language


Leibzon                 Expires January 11, 2006                [Page 1]

Internet-Draft         Content-Digest and EDigest              July 2005


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3

   2.  Content-Digest Header Field  . . . . . . . . . . . . . . . . .  4
     2.1   Algorithm ("a") parameter  . . . . . . . . . . . . . . . .  4
     2.2   Host Information ("i") parameter . . . . . . . . . . . . .  5
     2.3   Header Field List ("h") parameter  . . . . . . . . . . . .  5
     2.4   Canonicalization Method ("c") parameter  . . . . . . . . .  6
     2.5   Canonicalized Data Size ("s") parameter  . . . . . . . . .  7
     2.6   Time Stamp ID ("t") parameter  . . . . . . . . . . . . . .  8
     2.7   Hash Data ("d") parameter  . . . . . . . . . . . . . . . .  8

   3.  Creation of Content-Digest Header Field  . . . . . . . . . . . 10
     3.1   Header Fields Processing . . . . . . . . . . . . . . . . . 10
     3.2   Content Body Data Processing . . . . . . . . . . . . . . . 13
     3.3   Digest Hash Creation . . . . . . . . . . . . . . . . . . . 16

   4.  Digest Hash Verification Procedure for Content-Digest  . . . . 19

   5.  EDigest Header Field . . . . . . . . . . . . . . . . . . . . . 21
     5.1   Content URL ("u") parameter  . . . . . . . . . . . . . . . 21
     5.2   Creation of EDigest Header Field . . . . . . . . . . . . . 23
     5.3   Verification of EDigest Header Field . . . . . . . . . . . 24

   6.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
     6.1   Simple Content-Digest as Replacement for Content-MD5 . . . 25
     6.2   Content-Digest used in Email Message . . . . . . . . . . . 25
     6.3   Content-Digest used in HTTP Transmission . . . . . . . . . 27
     6.4   EDigest used in Email  . . . . . . . . . . . . . . . . . . 27

   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 30

   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 31

   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 32
     9.1   Normative References . . . . . . . . . . . . . . . . . . . 32
     9.2   Informative References . . . . . . . . . . . . . . . . . . 32

       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 33

   A.  Collected Grammar  . . . . . . . . . . . . . . . . . . . . . . 34

       Intellectual Property and Copyright Statements . . . . . . . . 37


Leibzon                 Expires January 11, 2006                [Page 2]

Internet-Draft         Content-Digest and EDigest              July 2005


1.  Introduction

   For data transmission it is often desirable to be able to confirm
   integrity of the data and make certain entire data has been
   transmitted without modification and common method used for this is
   to calculate and send cryptographic hash digest of the data.  With
   data transmissions involving MIME encapsulated data as used in SMTP,
   HTTP and other protocols this can be accomplished with Content-MD5
   header field defined in [RFC1544].

   However Content-MD5 header is tied to MD5 hash algorithm and recent
   research summarized in [draft-hoffman-hash-attacks-04] indicates that
   it is vulnerable to certain collision attacks and with fast computers
   the collision can be found in as fast as 4 hours.  Additionally
   Content-MD5 hash creation involves only message data where  as
   additional important information regarding message or MIME part can
   also be contained in its header and data integrity of header fields
   needs to be protected as well.

   This document defines Content-Digest header field which provides
   universal syntax for including hash of MIME or message data and
   optionally its header fields with support for several hash algorithms
   and canonicalization methods and other optional information regarding
   digest creation.  Additionally EDigest header field is also defined
   which allows to include digest information for external MIME part or
   hash of several parts joined together.


Leibzon                 Expires January 11, 2006                [Page 3]

Internet-Draft         Content-Digest and EDigest              July 2005


2.  Content-Digest Header Field

   Content-Digest is similar to Content-MD5 as it is a unique per-MIME
   part (or message data) field specifying hash digest for entire mime
   content and optionally including mime content header data.  There can
   be only one Content-Digest header field in any mime or message header
   and it should be added by the originating user agent (this means it
   is added by MUA acting as MSA as far as email routing is concerned).
   Message relays and gateways are expressly forbidden from adding
   Content-MD5 and adding them to messages already in transit.

   Content-Digest header field has syntax similar to MIME fields and
   consists of multiple parameters with values separated by ";" (i.e.
   "param1=val1 ; param2 =val2"), please refer to Appendix A for the
   exact ABNF syntax.

   The start of the Content-Digest header field is always version
   information parameter, which looks like this:

   Content-Digest: v=1.0 ; ...

   This document describes use of the Content-Digest header field with
   version "1.0".  Software that can understand "1.0" version of the
   Content-Digest header field SHOULD also attempt to interpret header
   fields that have the same major number "1" but different minor number
   (i.e. like "1.1").  Interpretation of Content-Digest header field
   with major number being anything other then "1" is not defined in
   this document and software that does not otherwise know how to
   interpret header field with different major number MUST NOT attempt
   to evaluate Content-Digest header field and message processing should
   be done as if no Content-Digest header field was present.

   After the version information header field contains number of
   additional parameters with parameter data ("d") being the only one
   that is required and all other parameters being either optional for
   processing or default value is assumed if the parameter is not
   present.  Each parameter is further specified in separate subsection
   in this document:

2.1  Algorithm ("a") parameter

   An algorithm for digest hash computation is specified by means of "a"
   attribute.  List of algorithms and possible values for "a" attribute
   is below:

      md5 - MD5 algorithm producing 128-bit hash as defined in [RFC1321]


Leibzon                 Expires January 11, 2006                [Page 4]

Internet-Draft         Content-Digest and EDigest              July 2005


      sha1 - SHA1algorithm producing 160-bit hash as defined in
      [RFC3174]

      sha224 - SHA algorithm producing 224-bit hash as defined in
      [RFC3874]

      sha256 - SHA algorithm producing 256-bit hash as defined in
      [FIPS180-2]

      sha384 - SHA algorithm producing 384-bit hash as defined in
      [FIPS180-2]

      sha512 - SHA algorithm producing 512-bit hash as defined in
      [FIPS180-2]

   If algorithm is not specified and "a" attribute is absent in Content-
   Digest header field, then it should be assumed to be default SHA1
   algorithm.

   An implementation confirming to this document MUST support SHA1
   algorithm, SHOULD support MD5 and SHA224 algorithms and MAY support
   SHA256, SHA384 and SHA512 and other algorithms.  New algorithms maybe
   introduced by other documents and do not require introduction of the
   new version number for Content-Digest.

   If algorithm specified in the "a" parameter of Content-Digest is
   unknown to the software evaluating the header, it MUST NOT attempt to
   evaluate Content-Digest header field and message processing should be
   done as if no header field was present.

2.2  Host Information ("i") parameter

   An optional "i" parameter maybe used to specify hostname of the
   system adding Content-Digest header field.  This is purely
   information parameter and is not used in the processing and
   evaluation of the digest header.

2.3  Header Field List ("h") parameter

   MIME header fields that are included with content data for digest
   hash computation are listed by means of "h" parameter of Content-
   Digest.  The field names are listed after the "h=" one by one
   separated by ",".  Group of field that have common starting name can
   also be specified by using * ending, for example "h=Content-*" means
   all header field names that start with Content- such as Content-Type,
   Content-Transfer-Encoding, Content-ID and others.  If '*' is not used
   then the name MUST match in full field name specified in the header
   up to ":", so "h=Content-Type" would match "Content-Type:" field name


Leibzon                 Expires January 11, 2006                [Page 5]

Internet-Draft         Content-Digest and EDigest              July 2005


   header line but would not match "Content-Type-Extra:" header line.
   If all field names are to be included then this is specified as
   "h=*", but careful consideration must be given if that is desirable
   as in some cases new header fields are added to the message or
   specific mime parts while it is in transit.

   The field names are not case-sensitive and "h=Content-Type" means
   that even if the actual header field name in the MIME part is
   "CONTENT-TYPE" it would be a match and similarly for "content-type"
   or "cOnTeNt-TyPe" or any other variation of the same letters in
   different case.

   Note that MIME fields may match against the list in the "h" parameter
   are not relative to the Content-Digest header field position in the
   header and may appear both below and above it in the MIME and message
   header.  Actual Content-Digest header field is never included as part
   of its own digest, even if Content-Digest name matches list of header
   fields in the "h=" (such as when "h=Content-*" or "h=*" are used).

   If there is no "h" parameter in Content-Digest header field then it
   means no header field data is included in the digest and digest is
   data is hash of content body only like it is with Content-MD5 header
   field.

2.4  Canonicalization Method ("c") parameter

   Canonicalization is a process of data transformation that makes
   format of the data acceptable based on constraints imposed by
   additional data processing functionality.  In the case of digest
   computation this describes the process of transforming data into
   canonical form and actual hash computation is then done on the data
   in this canonical form.

   Canonicalization is most useful as way to insure that data hash can
   be verified even if some small data conversion is done when message
   is being transmitted.  For example some intermediate message
   processing software interpret and correct what they consider to be
   header field problems such as case variations or too many white-space
   characters between header field name and value of the field; in other
   cases message processing software may remove trailing white-space
   characters on any line or first or last empty lines in the message.
   All such processing would normally result in not being able to verify
   hash of original message content, but some canonicalization methods
   can take this behavior into account and provide consistent format of
   data for digest verification.

   Note that doing canonicalization for digest computation does not mean
   that such canonicalized data is actually transmitted.  Conversion and


Leibzon                 Expires January 11, 2006                [Page 6]

Internet-Draft         Content-Digest and EDigest              July 2005


   data transformation rules for data transmission are in fact covered
   by content-transfer-encoding as specified in part 6 of [RFC2045].  As
   it relates to canonicalization and digest computation, content-
   transfer-encoding conversion should be done on original non-
   canonicalized data after the digest hash has been computed and
   appropriate Content-Digest header field added.  And when digest is
   being verified, the canonicalization and digest computation are done
   after undoing any content-transfer-encoding.

   Similar to support provided for multiple cryptographic algorithms,
   Content-Digest provides supports using multiple canonicalization
   processing methods with small set of methods being required to be
   supported by all implementations.  The canonicalization methods used
   for header field processing and for content body are also different
   and so  "c" parameter value is composed of two separate parts - "a,b"
   where 'a' specifies method used for header field data
   canonicalization and 'b' specifies method used for body
   canonicalization.  The canonicalization methods that MUST be
   supported are "bare", "simple", "nofws" for header fields processing
   and "bare", "text", "nofws", plus special values of "mimeform" and
   "none" for content body processing.  If canonicalization method
   specified in the "c" parameter of Content-Digest is unknown to the
   software evaluating the header field, it MUST NOT attempt to evaluate
   Content-Digest header field and message processing should be done as
   if no Content-Digest header field was present.

   If there is no "c" parameter specified in the Content-Digest header
   field than it is assumed to be default "simple,mimeform" value.  If
   value of the "c" parameter is one keyword like "c=nofws", than when
   doing canonicalization default "simple" method is to be used for
   header fields canonicalization and for body data the canonicalization
   method specified as value of "c" is to be used.

   More information about canonicalization methods and canonicalization
   process can be found in section 3 of this document.

2.5  Canonicalized Data Size ("s") parameter

   Number of bytes (octet count) in the canonicalized data (as used for
   computing hash digest) can optionally be included in the "s"
   parameter.  This is primarily informational field and can be used
   during digest header verification as way to determine if content had
   been modified.  If the number in "s" does not match the number of
   bytes of the canonicalized digest being verified then verifying
   system SHOULD abort the processing and can choose to report an
   extended error indicating that content has been changed and size does
   not match.


Leibzon                 Expires January 11, 2006                [Page 7]

Internet-Draft         Content-Digest and EDigest              July 2005


   There may also be some situations where being able to verify majority
   of the data is sufficient.  In such a case an application MAY try to
   use size parameter and after doing canonicalization if the result is
   larger then the original, cut the result to be exactly the number of
   bytes as specified in "s" and then attempt to do the verification of
   the digest.  If it succeeds, such application should in some way
   report that only part of the content was successfully verified and
   may also optionally choose to discard the unverified part of the
   message content data.

   More information about how size parameter is used is found in section
   4.

2.6  Time Stamp ID ("t") parameter

   Optional "t" parameter is used for providing time-stamp information
   on when digest hash was created.  In EDigest this is also used as
   unique identifier (unlike Content-Digest, multiple EDigest header
   fields can exist in the same header).

   The value of the "t" parameter is data based on ISO8601 time format
   and consists of multiple digits of the form YYYYMMDDhhmmssxxxx where
   YYYY is 4-digit year, MM is 2-digit month, DD is 2-digit day, hh is
   2-digit hour, mm is 2-digit minute, ss is 2-digit seconds, xxx are
   additional digits that maybe milliseconds or some other unique number
   identifying specific header field.  The number may well be less then
   18 digits (14 is a lot more common) and for example may only contain
   YYYMMDD.  The time and date data used should be UTC with no locale
   information.  Some examples of "t" parameter as follows:
    t=20050704142754  - corresponds to RFC2822 date
                   "Mon, 4 Jul 2005 14:27:54 -0500"
    t=20050503 - corresponds to May 3, 2005

   Note that the number specified in the "t" parameter is informational
   only and should not be assumed to always be a time-stamp or
   automatically interpreted as such by the application; automatic use
   of this number should be limited to providing unique reference.
   However, that this number usually contains timestamp maybe of use for
   purposes of email debugging and forensics.

2.7  Hash Data ("d") parameter

   Data parameter contains the actual digest hash data.  Hash is
   calculated using algorithm specified in 'a' parameter based on data
   from content header fields (those that match listing in 'f'
   parameter) and content body after applying appropriate
   canonicalization as specified in "c" parameter.  Resulting hash data
   is converted into BASE64 encoding as specified in section 3 of


Leibzon                 Expires January 11, 2006                [Page 8]

Internet-Draft         Content-Digest and EDigest              July 2005


   [RFC3548] with '=' pad symbol and placed after 'd='.  If hash BASE64
   hash data ends with '=' then the data MUST also be enclosed in double
   quotes, i.e. d="...="

   Hash data can be broken into multiple lines as specified in [RFC2822]
   section 2.3.3 but its preferable that entire data parameter (starting
   with 'd=') stay on one line in the header.  It is also preferable
   that data parameter be the last parameter of the Content-Digest
   header.  Use and placement of data is illustrated in more details in
   the examples contained in section 6 of this document.


Leibzon                 Expires January 11, 2006                [Page 9]

Internet-Draft         Content-Digest and EDigest              July 2005


3.  Creation of Content-Digest Header Field

   Content-Digest field is created by the originating user agent which
   starts transmission of the content and not by intermediate content
   retransmission system.  For email the originating user agent is an
   MUA program or any other program acting as MSA and as such is the
   originating agent in SMTP transmission.  For HTTP, the originating
   agent is an HTTP server that serves the content from its data storage
   where it has been placed by the user or generates it on the fly (CGI
   or similar), but not any kind of caching HTTP system which does not
   actually generates the content by only retransmits the content
   received from another web server.  Other MIME transmission protocols
   can also use Content-Digest using similar criteria to above in
   deciding which system involved in transmission should be adding
   Content-Digest header field.

   The content transmission origination system (thereafter CTOS) that
   wants to add Content-Digest header field should proceed as follows:

3.1  Header Fields Processing

   First decision should be made on what data is to be used for digest
   hash, which is based on local preferences and on how digest hash is
   going to be used.  Generally it is good idea to include only content-
   specific header fields such as Content-Type but not transmission
   header fields such as Connection in HTTP.  This is because while
   content-specific fields are not something that should change during
   transmission, the other header fields may change if content is
   retransmitted (such as by forwarding or other redirection system in
   email or by caching proxy server in HTTP).  Content-Transfer-Encoding
   header field (which provides information on data transmission
   encoding) is thus something should be included in the list of header
   fields only if change of transit encoding by intermediate systems is
   not allowed (which is not always true)

   After list of header fields which are to be part of digest hash data
   is ready, entire "h" attribute can be created.  Consideration should
   be given as to if use of "*" is appropriate to combine several fields
   together because if new fields with same prefix are added by
   intermediate retransmission systems this would cause failure in
   digest verification (as such "h=*" should generally not be used
   unless message transmission is point-point and no retransmission
   systems are expected or allowed and use of "h=Content-*" is possible
   only if Content-Transfer-Encoding is not specified or is not expected
   to change).

   Next canonicalization should be applied to the header fields data.
   There are 3 header canonicalization processing methods defined by


Leibzon                 Expires January 11, 2006               [Page 10]

Internet-Draft         Content-Digest and EDigest              July 2005


   this document - 'bare', 'simple' and 'nofws' and to show how they
   differ an example will be helpful so it is assumed that as at the
   beginning the content data header was:

     Content-Type:  text/plain;
       charset="us-ascii"
     MIME-Version: 1.0
     Content-ID: <218F64C460.u314@example.com>
     Content-Transfer-Encoding: 7bit
     Content-Description: Collection  Footer

   And lets assume that for this example, the header fields to be
   included are all of the above except Content-Transfer-Encoding and so
   this is described with attribute "h=Content-Type,Content-ID,Content-
   Description,MIME-Version".

   Now the following is how canonical data form is calculated depending
   on which canonicalization method is used (for each method it is
   assumed that we start with empty canonical header form buffer):

   BARE - In this canonicalization method header data is largely used as
      is.  The algorithm is: for each header field name listed in "h" in
      the order the fields are listed, try to find one or more instances
      of the matching field (full name exactly the same as listed up to
      ":" or if * is used, then field name up to * is the same) and then
      entire header field line as is starting with field name itself and
      up (but not including) the first letter of the next header field
      in the header (including end of line characters) are added into
      canonical data form buffer.  For the example above, the result of
      applying this method is canonical buffer data as follows:

     Content-Type:  text/plain;
       charset="us-ascii"
     Content-ID: <218F64C460.u314@example.com>
     Content-Description: Collection  Footer
     MIME-Version: 1.0

   SIMPLE -  In this canonicalization method, common problems that are
      encountered with transformation of the header fields are accounted
      for and data is made to be consistent with what defined in ABNF
      header field syntax in [RFC2822] except 8-bit data is not touched
      (by RFC2822 there should not be any 8-bit data in the message and
      mime header but unfortunately it does happen).  The system for
      choosing header fields and their order is the same as with 'bare'
      but header fields data is not copied as-is to the canonical data
      form buffer, instead the following is done for each header field:


Leibzon                 Expires January 11, 2006               [Page 11]

Internet-Draft         Content-Digest and EDigest              July 2005


      1.  If header field consists of multiple lines, the lines are
          unfolded (procedure described in section 2.3.3 of [RFC2822]
          and involves removal of CRLF pair) to become one long field
          line.  If they are any single line break characters CR or LF
          they are also to be removed as well as any NULL (ASCII code 0)
          characters.  In above example the only header field that is
          effected is "Content-Type" which consists of data in two
          lines.

      2.  All multiple concurrent white-space characters (white-space is
          WSP as defined in [RFC2822] section 2.2.2 and includes SP and
          HTAB) are eliminated from the header field.  In above example
          this would effect double white space after "Content-Type:" and
          double white space between "Collection" and "Footer".

      3.  The header field name itself is made to be entirely lowercase.
          That means in header field name (start of header field line to
          first ":") for each octet character with ASCII code 'a"
          between 65 and 90 the character is replaced with character
          with ASCII code a+32.

      4.  If there is a sequence of one or more WSP at the end, it is
          removed.

      5.  A new CRLF character is added to the end of the newly
          converted header field line.

      The result of applying this method to the example given above is
      canonical data block:

     content-type: text/plain; charset="us-ascii"
     content-id: <218F64C460.u314@example.com>
     content-description: Collection Footer
     mime-version: 1.0

   NOFWS -  In this canonicalization method (which name is abbreviation
      for "No Free White Space"), only the alpha numeric characters of
      data are used for digest.  While that means that the core of the
      content text is preserved and verified, but there maybe some
      problems with this system as all spaces between words are lost.
      This canonicalization method uses similar algorithm to 'simple'
      with the following steps for data transformation:

      1.  If header field consists of multiple lines, the lines are
          unfolded (procedure described in section 2.3.3 of [RFC2822]
          and involves removal of CRLF pair) to become one long field
          line.  If they are any single line break characters CR or LF
          they are also to be removed as well as any NULL (ASCII code 0)


Leibzon                 Expires January 11, 2006               [Page 12]

Internet-Draft         Content-Digest and EDigest              July 2005


          characters.  In above example the only header field that is
          effected is "Content-Type" which consists of data in two
          lines.

      2.  All octet characters with ASCII value less then 33 and more
          then 126 are removed from header data.

      3.  The header field name itself is made to be entirely lowercase.
          That means in header field name (start of header field line to
          first ":") for each octet character with ASCII code 'a"
          between 65 and 90 the character is replaced with character
          with ASCII code a+32.

      The result of applying this method to example above is canonical
      data block as follows (note that " \" is used to indicate line
      break for purposes of this document only since the result of using
      above canonicalization method is one long line in without breaks):

     content-type:text/plain;charset="us-ascii"content-id: \
     <218F64C460.u314@example.com>content-description:Coll \
     ectionFootermime-version:1.0

   It is RECOMMENDED that default 'simple' canonicalization method be
   used when content data is being transmitted to unknown recipient
   across the Internet.  This canonicalization method can deal with
   common header data transformation by intermediate systems and does
   not cause loss of content data.  If it is important to make certain
   data is received exactly the same as it was transmitted with no
   modifications or reformatting of any kind, then 'bare'
   canonicalization can be used but this should normally be reserved
   only for known and pre-arranged data path transmission where it is
   known to be safe.  For cases when data transmission goes through
   series of relays and it has been noticed that digest hash does not
   verify as a result, then using 'nofws' can be considered but it
   should be noted that it only provides verification of the text
   symbols and is not secure enough for full data integrity protection.

3.2  Content Body Data Processing

   Content body processing for digest hash creation may also involve
   data transformation to canonical format depending on chosen
   canonicalization method.  There are 4 body data canonicalization
   processing methods defined by this document - 'bare', 'text',
   'mimeform', 'nofws' and 'none' and, as with to header fields
   canonicalization, to show how they differ a simple example of data
   before canonicalization is used (note that "\cr and \lf in the
   example represent CR and LF characters):


Leibzon                 Expires January 11, 2006               [Page 13]

Internet-Draft         Content-Digest and EDigest              July 2005


   \cr\lf
   Happy 4th of July,\cr
   \cr
   Fireworks at pier 39 at 9:30pm, be there. \cr
   \cr
   Will

   Now the following is how canonical body data is calculated depending
   on which canonicalization method is used (for each method it is
   assumed that we start with empty canonical body buffer):

   BARE - In this canonicalization method body data is unchanged and
      used 100% as-is.

   TEXT - In this canonicalization method, some problems that are
      encountered with transmission of text data are dealt with and its
      made certain that data is consistent with canonical form of MIME
      text/plain content-type object as described in section 4 of
      [RFC2049] and with text message data format as described in
      section 2 of [RFC2822].  This is done as follows:

      1.  All NULL (ASCII code 0) characters are removed and any single
          CR or single LF character is replaced with CRLF pair (if CR is
          already followed by LF, then neither is changed).

      2.  The data is examined to make certain that all lines are no
          longer then 998 octet characters long (a line is defined as
          continues stream of characters terminated by CRLF and starting
          with either beginning of the data or with first character
          after previous CRLF).  If any line is longer then 998
          characters then after 998's character a CRLF pair is inserted
          and the procedure described in this step is repeated.

      3.  Any sequence of one or more white-space characters (white-
          space is WSP as defined in [RFC2822] section 2.2.2 and
          includes SP and HTAB) that are immediately followed CRLF is
          removed.

      4.  If there is a sequence of one or more CRLF pairs at the start
          of the data content (as left following after 3) it is removed.

      The result of applying this method to example above is canonical
      data block as follows:


Leibzon                 Expires January 11, 2006               [Page 14]

Internet-Draft         Content-Digest and EDigest              July 2005


   Happy 4th of July,\cr\lf
   \cr\lf
   Fireworks at pier 39 at 9:30pm, be there.\cr\lf
   \cr\lf
   Will

   NOFWS -  This is another canonicalization primarily for text data and
      in this canonicalization method only the alpha numeric characters
      of data are left (this is less secure as far as data integrity but
      the core information of the context text is still protected).  The
      canonicalization method is actually fairly simple and consists of
      one step as follows:

      *  All NULL (ASCII code 0), CR (ASCII code 13), LF (ASCII code
         10), HTAB (ASCII code 9), VTAB (ASCII code 11), FF (ASCII code
         12) and SP (ASCII code 32) characters are removed.

      The result of applying this method to example above is canonical
      data block consisting of one line as follows:

   Happy4thofJuly,Fireworksatpier39at9:30pm,bethere.Will

   MIMEFORM  -  This is a special canonicalization method which is meant
      to have data converted into MIME canonical form.  As described in
      section 4 of [RFC2049] MIME canonical form depends on the type of
      data which is based on Content-Type and as far as this
      canonicalization method, if data is text media type (based on
      Content-Type: text/????) then TEXT canonicalization method is
      used.  For all other media types, BARE canonicalization method is
      used.  This is default canonicalization method for content data.

   NONE -  A special canonicalization value of "none" allows to specify
      that body data is not part of digest hash (i.e. the
      canonicalization process uses none of the data).  This is used
      with EDigest (extended form of Content-Digest header field which
      is discussed further in section 5 of this document) when it is
      desirable to create digest hash only for a group of specific
      header fields.

   For text content data the 'text' canonicalization is fairly flexible
   to take care of common alterations with no security risks and If it
   has been noticed that data transmission is likely to involve relays
   that do such modifications that digest hash would no longer verifying
   as a result, then using 'nofws' can also be considered.  In cases
   where it is very important to make certain data is received exactly
   the same as it was transmitted with no modifications or reformatting
   of any kind, 'bare' canonicalization can be used but this should
   normally be reserved only for known and pre-arranged data path


Leibzon                 Expires January 11, 2006               [Page 15]

Internet-Draft         Content-Digest and EDigest              July 2005


   transmission where it is known to be safe.  If the content data is
   not text and is not going to be transmitted as text (i.e. with 7bit
   or quoted-printable content-transfer-encoding), then it is very
   unlikely to be touched by any intermediate system and using 'bare'
   canonicalization method is appropriate.

   Based on above the CTOS should make certain to use appropriate
   canonicalization.  It is important to understand that default
   'mimeform' depends on the Content-Type header field value and
   defaults to 'text' for any text media type MIME content and to "bare"
   otherwise and this works well for most cases.  There may also be
   number of other content-types which are not specifically identified
   as text media type but that use text data and in those cases CTOS
   should specifically select 'text' canonicalization method and specify
   that in "c" parameter.  Notice that Multipart and Message complex
   mime types are also very often composed only of text components and
   in such cases using 'text' canonicalization may also be appropriate
   and will need to be specified in the "c" parameter.

3.3  Digest Hash Creation

   After processing of content header and body as described above, the
   result would be two data buffers with results of containing
   canonicalized form of header fields and canonicalized data body.  For
   the actual data used for digest hash creation, these are joined
   together with canonical header fields data going first and then
   canonicalized data body being added to that.

   Note that even if 'bare' body canonicalization is used for both
   header fields data and content body processing, the result of there
   canonical form being joined would not be the same as original MIME
   content part as it would be missing line separating content header
   and body.  So for example if the original MIME content was:

     Content-Type:  text/plain;
       charset="us-ascii"
     MIME-Version: 1.0
     Content-ID: <218F64C460.u314@example.com>
     Content-Transfer-Encoding: 7bit


     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will

   The result of using canonicalization and header processing described
   by "c=bare,bare; h=content-type,content-id,mime-version" would be the
   following data (with octet count 183) ready for hash creation:


Leibzon                 Expires January 11, 2006               [Page 16]

Internet-Draft         Content-Digest and EDigest              July 2005


     Content-Type:  text/plain;
       charset="us-ascii"
     Content-ID: <218F64C460.u314@example.com>
     MIME-Version: 1.0

     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will

   Where as result of using default "simple,mimeform" canonicalization
   with same list of header fields (described by "h=content-
   type,content-id,mime-version; c=simple,mimeform;" or just "h=content-
   type,content-id,mime-version") would be:

     content-type: text/plain; charset="us-ascii"
     content-id: <218F64C460.u314@example.com>
     mime-version: 1.0
     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will

   The octet byte count of above data is 177 (the size calculation
   includes newline CRLF symbols which thus increase add 2 bytes for
   every line, even the empty ones).  Calculation of the octet size of
   canonicalized data is optional and done in order to be used as value
   of "s" parameter of Content-Digest.

   Once processing of content is complete and data is ready, the
   cryptographic hash algorithm is applied to that data.  The choice of
   the hash algorithm should be made based on system policies and
   security considerations in regards to the transmission.  Default SHA1
   algorithm is a good choice and offers sufficient security for most
   cases and it is NOT RECOMMENDED that anything less secure that does
   not result in at least 160-bit hash be used.  The actual hash
   creation is described other documents, please see [RFC1321] on how to
   create MD5 hash, [RFC3174] for SHA1 hash and [FIPS180-2] for other
   versions of SHA1 algorithm that produce more then 160-bit hash data.

   The result of the entire of the process as described in this section
   would be the following Content-Digest header field:

     Content-Digest: v=1.0; h=content-type,content-id,mime-version;
       c=simple,mimeform; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="

   Note that since "a=sha1" and "c=simple,mimeform" are default, the
   above can shortened to:


Leibzon                 Expires January 11, 2006               [Page 17]

Internet-Draft         Content-Digest and EDigest              July 2005


     Content-Digest: v=1.0; h=content-type,content-id,mime-version;
       d="0ZOMSM79tU+ujUVmjaOkRBmad8k="

   Now Content-Digest header is ready to be added into content part.  It
   is best to add Content-Digest below other MIME or message header
   fields (some of which would have been part of data that went into
   digest hash), but Content-Digest could be added into other parts of
   the header as well.  The above example, after Content-Digest is added
   becomes:
     Content-Type:  text/plain;
       charset="us-ascii"
     MIME-Version: 1.0
     Content-ID: <218F64C460.u314@example.com>
     Content-Transfer-Encoding: 7bit
     Content-Digest: v=1.0; h=content-type,content-id,mime-version;
       c=simple,mimeform; a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="

     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will

   HTTP also has concept of "trailer" (which is data after content body
   and consists of same types of fields as header), Content-Digest can
   be placed in the trailer if desired, but note that list of fields in
   "h" only represents fields in header and NOT in the trailer.


Leibzon                 Expires January 11, 2006               [Page 18]

Internet-Draft         Content-Digest and EDigest              July 2005


4.  Digest Hash Verification Procedure for Content-Digest

   Verification procedure for Content-Digest header follows largely the
   same procedures as for creation of the header field.  This is done as
   follows:

   1.  First for content part being verified, the header is taken and
       canonical version of that is produced following procedure
       outlined in section 3.1 and based on header fields list found in
       "h" parameter and header canonicalization method listed in "c"
       parameter (note that canonical version must not include the
       actual Content-Digest header field even if it would match based
       on list in "h").  If there was no "h" parameter in the Content-
       Digest header, then the result of this step is empty string.

   2.  Next canonical version of content body data is produced as
       described in section 3.2 based on "c" parameter.  This is added
       to the data produced as a result of step 1

   3.  If parameter "s" is present in Content-Digest header field, then
       the octet size of the data from step 2 is calculated.  If this
       size does not match value in parameter "s", then verifying system
       has the following options:

       1.  Abort further processing and return an error indicating that
           Content-Digest can not be verified and content has been
           changed.

       2.  Cut down the number of bytes from the end of the data from
           step 2 to so it matches the number of bytes in "s" before
           proceeding to step 4.

   4.  Cryptographic hash is produced using algorithm listed in
       parameter 'a' (sha1 if 'a" is not present) based on data from
       step 2 (or step 3.2).  This cryptographic hash data is compared
       against data in "d" parameter of Content-Digest.  If they match,
       then result of verification is success, otherwise its a failure.

   In regards to removing data from canonicalized content as indicated
   in step 3.2 to match size parameter in Content-Digest header field,
   this is something that should be done only in specific context where
   it is believed that an intermediate system may exist that has added
   extra data to the end of content during transmission.  This happened
   to be the case of email message that came through mail list (which
   often add their own footer to the message) and so dropping the end of
   the email message would allow to verify the original version.
   However, one must be aware of the dangers of doing so as it means
   only part of the message data is verified and this is a serious


Leibzon                 Expires January 11, 2006               [Page 19]

Internet-Draft         Content-Digest and EDigest              July 2005


   security issue that can be exploited.  It is therefore best that if
   the verifying system chooses to verify only part of the content, that
   it consider changing the entire message to only include part that has
   been verified (optionally this may involve not removing the
   unverified content part, but instead moving it into separate
   attachment content data).  It should be noted that since mail lists
   add their footer to text messages, this method should not be
   attempted if data content is of type other then text and for binary
   data the Content-Digest verification should simply be considered to
   have failed (as in step 3.1 above) if size of canonicalized content
   does not match value of "s" parameter of Content-Digest.


Leibzon                 Expires January 11, 2006               [Page 20]

Internet-Draft         Content-Digest and EDigest              July 2005


5.  EDigest Header Field

   EDigest header field is very similar to Content-Digest (it can be
   considered an extended form of Content-Digest) and also includes hash
   digest data of the content, but unlike Content-Digest, it does not
   have to be unique field for particular MIME part or attached to it.
   As such the EDigest header field provides the following additional
   functionality over Content-Digest:

   1.  It can be added by intermediate transport agents (including
       message relays and gateways) and not only at the transmission
       origin.

   2.  It can be in used in parts other then content header itself and
       as such allows digest reference for MIME subpart of the message
       and for externally located MIME part

   3.  It can provide digest hash that can be used to verify data of
       several MIME parts together.

   The syntax of Edigest field (for full syntax please refer to Appendix
   A) is identical to Content-Digest and consists of all the same
   parameters plus one more optional parameter "u".

5.1  Content URL ("u") parameter

   The value of EDigest header field "u" parameter is URL data pointing
   to the content which hash the digest header field is for.  This URL
   data is list of one or more URLs with each URL enclosed in "<" and
   ">" and separated by FWS - this is very similar to how URLs are
   specified in References header field in email header.

   Common use of URL parameter is when EDigest header field specifies
   hash of MIME entity which is enclosed within another MIME entity or
   message and its desirable to provide hash of the content directly in
   this parent entity.  In such a case "cid" (Content-ID as specified in
   [RFC2392]) is used and specifies reference to unique id of the
   content as is found in its Content-ID header field.  An example of
   such use is as follows:


Leibzon                 Expires January 11, 2006               [Page 21]

Internet-Draft         Content-Digest and EDigest              July 2005


     Edigest: v=1.0; u="<cid:218C460.u314@example.com>";
       h=content-type,message-id,mime-version;
       a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="
     From: will@example.com
     To: mary@example.net
     Subject: Fireworks
     Date: Mon, 4 Jul 2005 12:34:26 -0400
     Message-ID: <will.123456789@example.com>
     Mime-Version: 1.1
     Content-Type: Multipart/Mixed; Boundary="NextPart"

     This message is in MIME format. The first part should be
     readable text, while the remaining parts are likely
     unreadable without MIME-aware tools.

     --NextPart
     Content-Type:  text/plain;
       charset="US-ASCII"
     MIME-Version: 1.0
     Content-ID: <218C460.u314@example.com>
     Content-Transfer-Encoding: 7bit

     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will

   URL scheme "Cid" should be considered to default URL scheme, so
   entering "cid:" is optional and parameter
   'u=<cid:218C460.u314@example.com>' can also be expressed simply as
   'u=<218C460.u314@example.com>'.

   Value of "u" parameter needs to be a reference to unique content part
   so no two content parts in the message can have the same Content-ID
   even if those are subparts of  "Multipart/Alternative" (in [RFC2392]
   it is specified that in such a case content parts may have common
   content-id for reference) for digest to be used with those parts.
   For those cases where having common content-id for referencing to one
   of the multiple parts within "Multipart/Alternative" is necessary for
   an application, such common reference id should be to Content-ID
   header field for actual Multipart/Alternative MIME part rather then
   Content-ID of its subpart.

   With digest "u" parameter it is also possible to specify more then
   one content part, for example:
     Edigest: v=1.0; h=content-type,content-id,mime-version;
       u="<218C460.u314@example.com> <218C460.u315@example.com>";
       a=sha1; d="0ZOMSM79tU+ujUVmjaOkRBmad8k="


Leibzon                 Expires January 11, 2006               [Page 22]

Internet-Draft         Content-Digest and EDigest              July 2005


   specifies that hash data in "d" is based on content data in both
   <218C460.u314@example.com> and <<218C460.u315@example.com>.

   This is used as replacement for having multiple digest header fields
   added for individual content parts when all these content parts are
   related and are not expected change individually for message during
   transport.  In such situations a hash for entire message could be an
   option, but such hash would not verify if content parts are
   rearranged or new content part is added to the message during
   transport, where as hash data in EDigest header field with multiple
   content parts listed in "u" would not be effected if new content is
   added to the message or if existing parts are in any way rearranged.

   Other uses of this could involve multiple external data components
   (such as data content parts available on web server) which are
   referenced from content parts in the message and which client is
   expected to have downloaded as part of message verification and
   presentation to the user.

5.2  Creation of EDigest Header Field

   EDigest header field is created in a similar way as Content-Digest,
   the differences are present only when the "u" parameter is to be
   used.  When EDigest is created referencing single content part in the
   message, the same procedures as described section 3 are followed,
   except that EDigest header field is not placed in the same content
   part.  Content-ID header field must be present in the header of
   content-part in the message that is referenced in "u" parameter with
   content-id URL.

   It is also possible to reference stationary remote content located on
   http, ftp or some other service.  If such content is MIME, then "h"
   parameter MUST be present and include at least one header field (such
   as Content-Type).  If remote content is not MIME then it is
   considered binary (even if it is only text) and both header and body
   canonicalization is to be set to 'bare' (i.e. "c=bare,bare") and 'h"
   parameter MUST NOT be present.

   When multiple URLs are listed as 'u' parameter value, then the
   procedure to produce hash is as follows:

   1.  Follow procedure described in section 3.1for header of the 1st
       content referenced in 'u' parameter.  This will result in buffer
       with canonicalized header fields data.

   2.  Follow procedure described in section 3.2 for content body data
       of the 1st content referenced in 'u' parameter and the result
       (canonicalized body content data) is added to data from step 1.


Leibzon                 Expires January 11, 2006               [Page 23]

Internet-Draft         Content-Digest and EDigest              July 2005


   3.  Follow procedure described in section 3.1for header of the second
       2nd referenced in 'u' parameter.  Add it to the end of result
       from step 2.

   4.  Follow procedure described in section 3.2 content body data of
       the 2nd content referenced in 'u' parameter.  Add it to the end
       of result from step 3.

   5.  ...

   6.  Follow procedure described in section 3.2 content body data of
       the last content referenced in 'u' parameter.  Add it to the end
       of result from previous steps.

   Hash algorithm is then used on the data from last step (the result of
   adding canonicalized data from all content parts) and the result goes
   into "d" parameter.  Similarly "s" parameter is optionally added and
   is octet count of all canonicalized data.

   Note that with multiple content parts in "u", the same list of header
   fields from "h" parameter is used and as such this list may have to
   include names of header fields that are present in one content part
   but not in another one in order to produce appropriate hash that
   includes all necessary data.

5.3  Verification of EDigest Header Field

   Verification of EDigest header field is done only if data for all
   content parts referenced in "u" are available to verifying agent.  If
   that is not so, verification should be aborted with error message
   indicating that some of the referenced data is not available.  If the
   data that is not available due to temporary dns error resolving
   domain name from one of the URLs in "u" parameter, then verifying
   agent may choose to delay verification and attempt it again at later
   time.

   The procedure for verification of EDigest header field is the same as
   described in section 5.3 to produce the hash which is then compared
   to hash in "d" parameter.  If they match, then EDigest is
   successfully verified, if they do not, then verification has failed.


Leibzon                 Expires January 11, 2006               [Page 24]

Internet-Draft         Content-Digest and EDigest              July 2005


6.  Examples


6.1  Simple Content-Digest as Replacement for Content-MD5

   In simple form without using "h" parameter, Contest-Digest header
   field can easily be used as replacement for Content-MD5 and, as
   majority of Content-Digest field parameters are optional or have
   default values, this does not require much more space:

   For the following small text content with Content-MD5 field:

     Content-Type: text/plain; format=flowed
     Content-MD5: vP5T2agfLQOCooDQF3lghA==

     Test Message

   The replacement of Content-MD5 with Content-Digest with md5 algorithm
   would be as follows (notice that hash data is the same):

     Content-Type: text/plain; format=flowed
     Content-Digest: v=1.0; a=md5; d="vP5T2agfLQOCooDQF3lghA=="

     Test Message

   When MD5 algorithm is replaced with more secure SHA1 (default when
   "a" is not present), the data would look as follows:

     Content-Type: text/plain; format=flowed
     Content-Digest: v=1.0; d="yH0loJWEwEDzv8U7VwGZWR3rELo="

     Test Message

6.2  Content-Digest used in Email Message

   The use of Content-Digest for email message which consists is
   composed entirely of one content part is shown in section 3.3.  Here
   this is expanded to show example of MIME multi-part email message
   with use of Content-Digest header both for particular email content
   parts and for entire email message.  In all cases the default "sha1"
   algorithm is used (and "a" parameter is not specifically added).


Leibzon                 Expires January 11, 2006               [Page 25]

Internet-Draft         Content-Digest and EDigest              July 2005


     From: will@example.com
     To: mary@example.net
     Subject: Fireworks
     Date: Mon, 4 Jul 2005 12:34:26 -0400
     Message-ID: <will.123456789@example.com>
     Mime-Version: 1.1
     Content-Type: MULTIPART/signed; Boundary="NextPart"
       protocol="application/pkcs7-signature"; micalg=sha1
     Content-Transfer-Encoding: 7bit
     Content-Digest: v=1.0; i=mail.example.com; t=2005070412342601;
       h=content-type,mime-version,message-id,date;
       c=nofws; d="rNqZDKbZ4eFzs/6Z67ivfIA2JPs="

     This message is in MIME format. The first part should be
     readable text, while the remaining parts are likely
     unreadable without MIME-aware tools.

     --NextPart
     Content-Type:  text/plain; charset="US-ASCII"
     MIME-Version: 1.0
     Content-ID: <218C460.u314@example.com>
     Content-Digest: v=1.0; h=content-type,content-id,mime-version;
       a=sha1; d="MSU3X80gRiNX1r2sjRzV4thQ5cs="

     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will

     --NextPart
     Content-Type: APPLICATION/pkcs7-signature; name="smime.p7s"
     Content-Transfer-Encoding: BASE64
     Content-Description: S/MIME Cryptographic Signature
     Content-Disposition: attachment; filename="smime.p7s"
     Content-ID: <218C460.u315@example.com>
     Content-Digest: v=1.0; h=content-*; t=2005070412341200;
       d="HlT99tyN/wczesmLuavpsr5qXbc="

     MIIEWgYJKoZIhvcNAQcCoIIESzCCBEcCAQExCzAJBgUrDgMCGgUAMAsGCSqG
     SIb3DQEHAaCCAl8wggJbMIIBxKADAgECAgMMcrUwDQYJKoZIhvcNAQEEBQAw
     ....

   Note that Content-Transfer-Encoding header field is included in the
   digest hash data for last content part.  While this may not be good
   for text data, BASE64 is well known for being the only transfer-
   encoding S/MIME signature and is not likely to ever be changed by
   intermediate transmission systems.  The actual canonicalized data
   ('bare' canonicalization is assumed by default since not specified)
   that goes into hash digest computation IS NOT BASE64, but binary


Leibzon                 Expires January 11, 2006               [Page 26]

Internet-Draft         Content-Digest and EDigest              July 2005


   8-bit data (since digest data is added based on original data before
   applying of content-transfer-encoding rules).  However the data used
   for hash computation of Content-Digest in the mail message header
   itself (identified by t=2005070412342601) would be based on
   encapsulated and encoded MIME parts within it with content-transfer-
   encoding applied and so in that case BASE64 encoded data is used (and
   mail message content hash also includes data from header fields of
   all message parts, including Content-Digest field with
   t=2005070412341200).

6.3  Content-Digest used in HTTP Transmission

   Content-Digest can be used as replacement for Content-MD5 for HTTP
   and is used in the same way and only when entire content part data is
   transmitted.  Here is an example:

   Date: Sun, 10 Jul 2005 15:02:03 GMT
   Accept-Ranges: bytes
   ETag: "8088c-13bfe-42d137fd-windows-1251"
   Server: Apache/1.3.22 (Unix) mod_deflate/1.0.21 mod_accel/1.0.31
   Vary: accept-charset, user-agent
   Content-Length: 80894
   Content-Type: text/html; charset=windows-1251
   Content-Digest: v=1.0; i=www.example.com;
     h=Content-Type,Last-Modified,ETag; c=bare;
     c=bare; d="MpUuKLUmoKUapc4q2kMyw3XzEUo="
   Last-Modified: Sun, 10 Jul 2005 15:00:13 GMT

   <html><head><title>Hello World</title></head>
   <body><h2>Hello World</h2></body>

   In cases when partial content data is transmitted (transmission in
   chunks) an HTTP instant digest maybe used for data integrity - please
   see [RFC3230] regarding this complimentary concept of digest header
   field specific to each connection.  To be able to verify entire data
   (rather then specific chunk), EDigest with "u" parameter pointing to
   permanent location of the data can be included in the header of each
   chunk with Content-Location header field also present in the same
   header.

6.4  EDigest used in Email

   Below is shown an example from 3.2, but with EDigest (with
   t=2005070510302601) being used in email header to provide hash of
   particular mime parts rather then entire message as a whole (as it
   was with Content-Digest in example 3.2).  The message after being
   delivered is then manually resent to listserver which adds additional
   mime part (mail list footer) and then mail list server ads new


Leibzon                 Expires January 11, 2006               [Page 27]

Internet-Draft         Content-Digest and EDigest              July 2005


   EDigest field (with t=2005070413063001).  Note that in email EDigest
   header fields are typically prepended to the message as trace data,
   which is different then Content-Digest fields that are added together
   with other Content fields by message originator and usually appear
   below them in content header.
     EDigest: v=1.0; i=lserv.example.org; t=2005070413063001;
       u="<218C460.u314@example.com> <218C460.u315@example.com>
          <fl0332.k1@example.org>";
       h="content-type,mime-version,content-id,content-digest,
          content-originator"; d="MJkDZynIX7LCZ8LBO/KB2UGQmU0="
     Received: from box.example.net (box.example.net [10.0.2.10])
       by lserv.example.org (8.12.1/8.12.1)
       with ESMTP id 4d343d31 for <family-list@example.org> ;
       Mon, 04 July 2005 13:06:20
     Resent-From: mary@example.net
     Resent-To: family-list@example.org
     Resent-Date: Mon, 4 Jul 2005 13:04:10 -0400
     Received: from mail.example.com (mail.example.com [10.0.0.1])
       by box.example.net (8.12.1/8.12.1)
       with ESMTP id nmonpqrst1 for <maxy.example.net> ;
       Mon, 04 July 2005 10:33:04 +0100
     EDigest: v=1.0; i=mail.example.com; t=2005070510302601;
       u="<218C460.u314@example.com> <218C460.u315@example.com>";
       h=content-type,mime-version,content-id,content-digest;
       d="COb/tgPpFD4JNS2vYelZAkk4aHU="
     From: will@example.com
     To: mary@example.net
     Subject: Fireworks
     Date: Mon, 4 Jul 2005 10:29:15 -0400
     Message-ID: <will.123456789@example.com>
     Mime-Version: 1.1
     Content-Type: MULTIPART/mixed; Boundary="NextPart"
     Content-Transfer-Encoding: 7bit

     This message is in MIME format. The first part should be
     readable text, while the remaining parts are likely
     unreadable without MIME-aware tools.

     --NextPart
     Content-Type:  text/plain; charset="US-ASCII"
     MIME-Version: 1.0
     Content-ID: <218C460.u314@example.com>
     Content-Digest: v=1.0; h=content-type,content-id,mime-version;
       a=sha1; d="MSU3X80gRiNX1r2sjRzV4thQ5cs="

     Happy 4th of July,
       Fireworks at pier 39 at 9:30pm, be there.
     Will


Leibzon                 Expires January 11, 2006               [Page 28]

Internet-Draft         Content-Digest and EDigest              July 2005


     --NextPart
     Content-Type: APPLICATION/pkcs7-signature; name="smime.p7s"
     Content-Transfer-Encoding: BASE64
     Content-Description: S/MIME Cryptographic Signature
     Content-Disposition: attachment; filename="smime.p7s"
     Content-ID: <218C460.u315@example.com>
     Content-Digest: v=1.0; h=content-*; t=2005070412341200;
       d="HlT99tyN/wczesmLuavpsr5qXbc="

     MIIEWgYJKoZIhvcNAQcCoIIESzCCBEcCAQExCzAJBgUrDgMCGgUAMAsGCSqG
     SIb3DQEHAaCCAl8wggJbMIIBxKADAgECAgMMcrUwDQYJKoZIhvcNAQEEBQAw
     ....

     --NextPart
     Content-Type: text/plain; charset=US-ASCII; format=flowed
     Content-Originator: "Family List" <family-list@example.org>
     Content-ID: <fl0332.k1@example.org>
     Content-Digest: v=1.0 h=content-*; d="c4ZKJPGIqDAfn/SrjbF8jI5448k="
     _______________________________________________
     private family mailing list - family-list@example.org


Leibzon                 Expires January 11, 2006               [Page 29]

Internet-Draft         Content-Digest and EDigest              July 2005


7.  IANA Considerations

   Two header fields are to be registered as follows:
   ---------------------------------------------------------------------
    Header field name:
      Content-Digest

    Applicable protocol:
      MIME

    Status:
      provisional

    Author/Change controller:
      William Leibzon <william@elan.net>

    Specification document(s):
      This document

    Related information:
      none
   ---------------------------------------------------------------------

   ---------------------------------------------------------------------
    Header field name:
      EDigest

    Applicable protocol:
      MIME, mail

    Status:
      provisional

    Author/Change controller:
      William Leibzon <william@elan.net>

    Specification document(s):
      This document

    Related information:
      none
   ---------------------------------------------------------------------

   Note to RFC Editor: this section may be removed on publication as an
   RFC


Leibzon                 Expires January 11, 2006               [Page 30]

Internet-Draft         Content-Digest and EDigest              July 2005


8.  Security Considerations

   This document specifies a data integrity mechanism to protects MIME
   data (including MIME header) from accidental modification while in
   transit from origin to destination.  Data integrity with Content-
   Digest and Edigest is not a replacement for end-end messaging
   security architecture such as S/MIME [RFC3851]or PGP [RFC3156] but
   may supplement them.  Addition of EDigest in automated way by message
   transport agents maybe used as basis for building automated email
   signing system.


Leibzon                 Expires January 11, 2006               [Page 31]

Internet-Draft         Content-Digest and EDigest              July 2005


9.  References

9.1  Normative References

   [RFC1321]  Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
              April 1992.

   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part One: Format of Internet Message
              Bodies", RFC 2045, November 1996.

   [RFC2049]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part Five: Conformance Criteria and
              Examples", RFC 2049, November 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2392]  Levinson, E., "Content-ID and Message-ID Uniform Resource
              Locators", RFC 2392, August 1998.

   [RFC2822]  Resnick, P., "Internet Message Format", RFC 2822,
              April 2001.

   [RFC3174]  Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1
              (SHA1)", RFC 3174, September 2001.

   [RFC3548]  Josefsson, S., "The Base16, Base32, and Base64 Data
              Encodings", RFC 3548, July 2003.

   [RFC3874]  Housley, R., "A 224-bit One-way Hash Function: SHA-224",
              RFC 3874, September 2004.

9.2  Informative References

   [FIPS180-2]
              "US Federal Information Processing Standards Publication
              180-2", August 2002, <http://csrc.nist.gov/publications/
              fips/fips180-2/fips180-2.pdf>.

   [RFC1421]  Linn, J., "Privacy Enhancement for Internet Electronic
              Mail: Part I: Message Encryption and Authentication
              Procedures", RFC 1421, February 1993.

   [RFC1544]  Rose, M., "The Content-MD5 Header Field", RFC 1544,
              November 1993.

   [RFC1738]  Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform


Leibzon                 Expires January 11, 2006               [Page 32]

Internet-Draft         Content-Digest and EDigest              July 2005


              Resource Locators (URL)", RFC 1738, December 1994.

   [RFC3156]  Elkins, M., Del Torto, D., Levien, R., and T. Roessler,
              "MIME Security with OpenPGP", RFC 3156, August 2001.

   [RFC3230]  Mogul, J. and A. Van Hoff, "Instance Digests in HTTP",
              RFC 3230, January 2002.

   [RFC3851]  Ramsdell, B., "Secure/Multipurpose Internet Mail
              Extensions (S/MIME) Version 3.1 Message Specification",
              RFC 3851, July 2004.

   [draft-hoffman-hash-attacks-04]
              Hoffman, P., "Attacks on Cryptographic Hashes in Internet
              Protocols", June 2005, <http://www.ietf.org/
              internet-drafts/draft-hoffman-hash-attacks-04.txt>.


Author's Address

   William Leibzon
   Elan Networks
   500 Laurelwood Rd, Suite 12
   Santa Clara, California  95054
   USA

   Email: william@elan.net


Leibzon                 Expires January 11, 2006               [Page 33]

Internet-Draft         Content-Digest and EDigest              July 2005


Appendix A.  Collected Grammar

   This appendix contains the complete ABNF grammar for the Content-
   Digest and EDigest header fields.  For any grammar terms that are not
   specifically defined below (such as CFWS and FWS), please refer to
   the [RFC2822] document and its ABNF grammar definitions.

   The ABNF grammar of Content-Digest header field is as follows:

      Content-Digest = "Content-Digest" ":" FWS version parameters

      version = "v=" version-number CFWS ";"

      version-number = "1.0" / unknown-version

      unknown-version = number-major "." number-minor

      number-major = 1*(digit)

      number-minor = 1*(digit)

      parameters = *(CFWS ";" FWS parameter) CFWS data-parameter
        *(CFWS ";" FWS parameter)

      data-parameter = ";" FWS "d=" value

      parameter = algorithm / headerfieldlist / canonicalization /
        size / timestamp / hostinfo / undefined-parameter
      ; Matching of parameter names is case-insensitive

      undefined-parameter = undefined-name "=" undefined-value

      undefined-name = token

      undefined-value = value

      algorithm  = "a=" algorithm-name

      algorithm-name =  "md5" / "sha1" / "sha224" / "sha256" /
        "sha384" / "sha512" / undefined-value
      ; Matching of algorithm names is case-insensitive

      canonicalization = "c=" [header-canonicalization ","]
        body-canonicalization
      ; Matching of header and body canonicalization is case-insensitive

      header-canonicalization = "bare" / "simple" /
        "nofws" / undefined-value


Leibzon                 Expires January 11, 2006               [Page 34]

Internet-Draft         Content-Digest and EDigest              July 2005


      body-canonicalization = "bare" / "text" /
        "mimeform" / "nofws" / "none" / undefined-value

      size       = "l=" 1*(digit)

      timestamp  = "t=" timestamp-value

      timestamp-value = 1*(digit) ["." 1*(digit)]

      hostinfo   = "i=" value

      headerfieldlist = "h=" headerfield *("," headerfield)

      headerfield = field-name
      ; Matching of header field names is case-insensitive

      field-name    =       1*ftext [ "*" ]

      ftext         =       %d33-57 /      ; Any character except
                            %d59-126       ; controls, SP, and ":"

      digit         =       %d48-57        ; Numeric Digit

      value = token / quoted-string

      token = 1*<any ASCII CHAR except SPACE, CTLs, or tspecials>

      tspecials =  "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" /
                   "\" / <"> /"/" / "[" / "]" / "?" / "="
      ; Must be in quoted-string to use within parameter values

   The ABNF grammar of EDigest header field is as follows (anything not
   defined here, please refer to Content-Digest grammar):


Leibzon                 Expires January 11, 2006               [Page 35]

Internet-Draft         Content-Digest and EDigest              July 2005


      EDigest = "EDigest" ":" FWS version edigest-parameters

      edigest-parameters = *(CFWS ";" FWS ed-parameter) CFWS data-parameter
        *(CFWS ";" FWS ed-parameter)

      ed-parameter = algorithm / headerfieldlist / canonicalization /
        size / timestamp / hostinfo / urlinfo / undefined-parameter
      ; Matching of parameter names is case-insensitive

      urlinfo = "u=" quoted-url / content-id
      ; content-id is as defined in RFC2392

      quoted-url = %d34 urldata $d34
      ; quoted-url must be used if urldata contains tspecials characters

      urldata = oneurl 0*(FWS oneurl)

      oneurl  = "<" value ">"
      ; value above is expected to be genericurl as defined in RFC1738 syntax
      ; but may also be content-id as defined in RFC2392


Leibzon                 Expires January 11, 2006               [Page 36]

Internet-Draft         Content-Digest and EDigest              July 2005


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Leibzon                 Expires January 11, 2006               [Page 37]