Network Working Group M. Kucherawy Internet-Draft April 5, 2015 Intended status: Experimental Expires: October 7, 2015 A List-safe Canonicalization for DomainKeys Identified Mail (DKIM) draft-kucherawy-dkim-list-canon-01 Abstract DomainKeys Identified Mail (DKIM) introduced a mechanism whereby a mail operator can affix a signature to a message that validates at the level of the signer's domain name. It specified two possible ways of converting the message body to a canonical form, one intolerant of changes and the other tolerant of simple changes to whitespace within the message body. The provided canonicalization schemes do not tolerate changes in a structured message such as conversion between transfer encodings or addition of new message parts. It is useful to have these capabilities to allow for transport through gateways, and also for transport through handlers (such as mailing list services) that might add content that would invalidate a signature generated using the existing canonicalization schemes. This document presents a mechanism for generating a canonicalization that can allows easy detection of modified content while still being valid for the content it originally signed. It also presents a use profile of DKIM that takes advantage of this capability. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 7, 2015. Kucherawy Expires October 7, 2015 [Page 1] Internet-Draft DKIM List Canonicalization April 2015 Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. The 'list' Canonicalization Description . . . . . . . . . . . 3 3.1. Preparing Content . . . . . . . . . . . . . . . . . . . . 4 4. 'The 'lh=' Signature Tag . . . . . . . . . . . . . . . . . . . 5 5. Use Profile . . . . . . . . . . . . . . . . . . . . . . . . . 6 6. Security Considerations . . . . . . . . . . . . . . . . . . . 6 6.1. Imported from DKIM . . . . . . . . . . . . . . . . . . . . 6 6.2. Added Content May Not Be Safe . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 7.1. DKIM-Signature Canonicalization Body Registry . . . . . . 7 7.2. DKIM-Signature Tag Specifications Registry . . . . . . . . 7 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . . 7 Appendix A. Example . . . . . . . . . . . . . . . . . . . . . . . 8 Appendix B. To-Do . . . . . . . . . . . . . . . . . . . . . . . . 10 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 10 Kucherawy Expires October 7, 2015 [Page 2] Internet-Draft DKIM List Canonicalization April 2015 1. Background DomainKeys Identified Mail [RFC6376] (DKIM) defines a mechanism whereby a verified domain name can be attached to a message, or portion of a message, using a cryptographic signature. It presents two possible schemes for converting the header block to a canonical form, and similarly two schemes for canonicalizing the body. In each case, one scheme permits no changes whatsoever, and the other permits limited changes restricted to areas such as whitespace munging, case changing, and header field wrapping. Some agents deliberately, but innocently, modify content in transit. A prime example of this is mailing lists, which might add a prefix to the Subject field of a message, add list-specific information to the header (in the form of new header fields), or append administrivia to the body of messages before they are re-mailed to the list subscribers. Use of mailing lists with respect to DKIM, and a discussion of related challenges, can be found in [RFC6377]. There is a desire to have DKIM signatures survive transit through lists. One way to do this is to make use of DKIM's "l=" tag which limits the portion of the body that is signed. This exposes an attack vector, however, since one can simply append any content to a partly-signed message and the signature will continue to verify. (See Section 8.2 of [RFC6376].) This document defines a new body canonicalization for DKIM that includes a partial signature for each message part in a message structured using Multipurpose Internet Mail Extensions (MIME; see [RFC2045]). This allows a clear delineation between the author- generated content (which would be signed by the author) and content added downstream (which would be signed by the other actor). A DKIM verifier can then determine whether the author-generated content is intact, and then identify and verify the content that was added later. The utility of this mechanism is predicated on the notion that agents that modify signed messages will do so in ways compatible with MIME. 2. Definitions Numerous terms used here, especially "Author", are defined in [RFC5598]. 3. The 'list' Canonicalization Description This section defines the 'list' body canonicalization algorithm. Kucherawy Expires October 7, 2015 [Page 3] Internet-Draft DKIM List Canonicalization April 2015 Put simply, the list canonicalization constructs a hash tree of the MIME structure of the message after each part has been decoded (for those with a Content-Transfer-Encoding field). The hash used is implied by the signature algorithm to be used (see the DKIM "a=" tag). Each of the hashes can be made a part of the signature to allow for more precise part validation, and identification of added content. 3.1. Preparing Content A message is prepared for canonicalization by applying the following steps in order: 1. Create an empty tree. Each node of the tree includes the following components: A. The MIME type and subtype of the part, expressed as would be found in a Content-Type header field, with no whitespace or comments; B. The unencoded content represented by the MIME part at this node; C. A series of octets that will contain a hash of the content; D. A series of zero or more pointers to other (child) nodes. 2. If the message is not encoded using MIME, insert a node at the root of the tree using a type/subtype of "text/plain" and the full body content. The hash is not initialized. 3. If the message is encoded using MIME, then the tree is populated in a way that mirrors the MIME structure of the message. In particular, the outermost MIME object will appear at the root node, and the only nodes that have children are those with a MIME type of "multipart". The hashes are not initialized. 4. For each leaf node, compute a hash of the content of that node. Store the hash in the node. 5. For each non-leaf node, if all of its child nodes now have computed hashes, concatenate the hashes (with order preserved), and compute and store a hash of the concatenation. 6. Repeat the previous step until all hashes in the tree have been populated. When this canonicalization is in use, the "bh=" tag will contain the Kucherawy Expires October 7, 2015 [Page 4] Internet-Draft DKIM List Canonicalization April 2015 hash stored at the root of the tree. The processes for signing and verification are otherwise unchanged. 4. 'The 'lh=' Signature Tag A signer can include an "lh=" tag, defined here, to make more than just the root hash information available to verifying agents. This permits identification of the specific part of the MIME structure that was modified, added or removed by an intermediary. The "lh=" tag is constructed by performing an in-order traversal of the canonicalization tree described in Section 3.1. At each node, each of the following is output, separated by a colon character (ASCII 0x3A): 1. A base64 expression of the hash at that node; 2. The MIME type of that node; 3. An integer expression of the number of children at that node. Between each node's output, a comma character (ASCII 0x2C) is output. Reconstruction of the MIME tree can be accomplished by the following steps: 1. Create a tree "T" containing a single empty node. 2. Create an empty node queue, "Q". 3. Create an information queue "I", containing the sequence of node information fields found in the "lh=" tag. 4. Select the root node of the tree. Call this node "N". 5. Extract the first batch of node information ("B") from the "lh=" tag. 6. Store the hash and MIME type from "B" into "N". 7. Enqueue the specified number of empty nodes into "Q", and attach them all as children of "N". 8. If "I" and "Q" are both empty, terminate. If one is empty and the other is not, an error has occurred. 9. Extract the next batch of node information from "I", as "B". Kucherawy Expires October 7, 2015 [Page 5] Internet-Draft DKIM List Canonicalization April 2015 10. Dequeue the next node from "Q", as "N". 11. Return to step 6. By comparing the hashes in and structure of this tree to those in the canonicalized tree, a receiver can identify parts of the tree (or entire subtrees) that have been modified. Parts not covered by the signature can also be identified. 5. Use Profile The intended use of this mechanism is to affix two DKIM signatures to a message. The first signature is added by the Author, and canonicalizes the original message in its entirety. The second signature is added by a modifying intermediary, such as a mailing list manager (MLM). When verifying, the Author signature on an unmodified message would pass verification. For a modified message, in the typical case, the verification step would observe that the Author signature failed but the intermediary's signature verified. When the "lh=" tag is present, it is possible to reconstruct the MIME structure of the signed message and compare it to that of the received message, including hashes of the content seen by each party. By comparing hash values at each node of the MIME structures, it is possible to determine in which MIME parts changes were made and/or new parts added or removed by the intermediary. The verifying agent can then determine whether those changes are acceptable before allowing the message to continue toward delivery. It is also possible to determine which agents in the handling chain took responsibility for which parts of the content. For example, while a Mediator's signature might indicate that the mediator is responsible for the entire (rewritten) message, it might also be possible to determine that the Author takes responsibility for all but one part of the message as well. The excluded part would be the part added by the Mediator, and can be handled separately from the Author's content. 6. Security Considerations 6.1. Imported from DKIM Section 8 of [RFC6376] discusses numerous security considerations relevant to DKIM. Of particular interest here is Section 8.2, which discusses concerns regarding signatures that sill verify in the presence of added message content. Kucherawy Expires October 7, 2015 [Page 6] Internet-Draft DKIM List Canonicalization April 2015 6.2. Added Content May Not Be Safe When the use profile described in Section 3 is applied, it is important to note that the added content was not signed by the Author domain, but only by the domain of the intermediary. Operators that might grant preferential handling based on valid DKIM signatures from favorable domains; assuming that appended content in the presence of such signatures does not mean the appended content is necessarily safe. 7. IANA Considerations 7.1. DKIM-Signature Canonicalization Body Registry IANA is requested to add the following entry to the DKIM-Signature Canonicalization Body Registry: Type: list Reference: [this document] Status: active 7.2. DKIM-Signature Tag Specifications Registry IANA is requested to add the following entry to the DKIM-Signature Tag Specifications Registry: Type: lh Reference: [this document] Status: active 8. References 8.1. Normative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC6376] Crocker, D., Hansen, T., and M. Kucherawy, "DomainKeys Identified Mail (DKIM) Signatures", STD 76, RFC 6376, September 2011. 8.2. Informative References [RFC5598] Crocker, D., "Internet Mail Architecture", RFC 5598, July 2009. [RFC6377] Kucherawy, M., "DomainKeys Identified Mail (DKIM) and Kucherawy Expires October 7, 2015 [Page 7] Internet-Draft DKIM List Canonicalization April 2015 Mailing Lists", BCP 167, RFC 6377, September 2011. Appendix A. Example To illustrate the use of this addition to DKIM, consider a message whose header and content are as follows: From: sender@example.com To: recipient@example.net Date: Mon, 23 Mar 2015 11:21:33 -0700 Subject: test message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="foobar" --foobar Content-Type: text/plain Text part #1 --foobar Content-Type: text/plain Text part #2 --foobar-- Figure 1: Example Message The MIME structure in this message can be represented as a tree. A node with media type "multipart" has a set of one or more children nodes, each of which starts with the corresponding boundary. A node of any other type contains actual content, and has no descendents, but has siblings under the same parent node. Thus, as a tree, the example message might be represented thus: +-----------+ | multipart | | mixed |--->// +-----------+ | | V +-----------+ +-----------+ | text | | text | | plain |--->| plain |--->// +-----------+ +-----------+ Figure 2: MIME structure Kucherawy Expires October 7, 2015 [Page 8] Internet-Draft DKIM List Canonicalization April 2015 Continuing with this illustration, a Mediator receives the message, and adds its desired "footer" content by appending a third text/plain MIME part after the existing content. This results in the following MIME structure: +-----------+ | multipart | | mixed |--->// +-----------+ | | V +-----------+ +-----------+ +-----------+ | text | | text | | text | | plain |--->| plain |--->| plain |--->// +-----------+ +-----------+ +-----------+ Figure 3: Augmented MIME structure Applying the signatures as described in Section 3 at both the Author and the Mediator, the final Verifier will see signatures that cover content as follows: +--------------------------------------------------------+ |+--------------------------------+ | || +-----------+ | | || | multipart | | | || | mixed |---// | | || +-----------+ | | || | | | || | | | || V | | || +-----------+ +-----------+ | +-----------+ | || | text | | text | | | text | | || | plain |--->| plain |-|->| plain |--->// | || +-----------+ +-----------+ | +-----------+ | |+--------------------------------+ | | A u t h o r s i g n a t u r e | +--------------------------------------------------------+ M e d i a t o r s i g n a t u r e Figure 4: Signature coverage of content With the additional information provided using this mechanism, it is now possible to verify both signatures, and also ascribe responsibility for different parts of the content to two different signature-generating entities. Kucherawy Expires October 7, 2015 [Page 9] Internet-Draft DKIM List Canonicalization April 2015 Appendix B. To-Do Explain how this works when the input message is not already a MIME message. Probably just canonicalize it as a multipart/mixed with a single text/plain in it. Handle more complex MIME structures from the author, such as something that's already multipart/mixed with some non-trivial structure to it. Appendix C. Acknowledgements The original idea was proposed by Ned Freed. The authors wish to acknowledge (names) for their comments during the development of this document. Author's Address Murray S. Kucherawy EMail: superuser@gmail.com Kucherawy Expires October 7, 2015 [Page 10]