Network Working Group M. Kucherawy Internet-Draft April 25, 2014 Intended status: BCP Expires: October 27, 2014 Architectural Approaches for Enhancing Email draft-kucherawy-email-caps-02 Abstract This document provides guidance regarding architectural decisions made when developing enhancements to the Internet message service ("email"). Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 27, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Kucherawy Expires October 27, 2014 [Page 1] Internet-Draft Enhancing Email April 2014 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Architectural Guidance . . . . . . . . . . . . . . . . . . . . 3 3. Enhancement History . . . . . . . . . . . . . . . . . . . . . 3 4. The Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 4 5. The Message . . . . . . . . . . . . . . . . . . . . . . . . . 5 6. Header vs. Envelope . . . . . . . . . . . . . . . . . . . . . 5 7. Deployment Observations and Results . . . . . . . . . . . . . 7 8. Consequences of Faulty Design . . . . . . . . . . . . . . . . 8 9. Security Considerations . . . . . . . . . . . . . . . . . . . 9 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 11.1. Normative References . . . . . . . . . . . . . . . . . . 9 11.2. Informative References . . . . . . . . . . . . . . . . . 10 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 10 Kucherawy Expires October 27, 2014 [Page 2] Internet-Draft Enhancing Email April 2014 1. Introduction The email service is fully described in [RFC5598]. It has two core components: The message payload, and the transfer protocol that conveys it. For various reasons discussed later in this document, it is common for enahncements to the service to be made in undesirable ways. This document first presents some basic architectural recommendations to be considered when enhancing the service, and then describes why these recommendations are apt and provides some history for them. 2. Architectural Guidance When enhancing the email service, it is critical to identify the precise nature of the enhancement. Specifically, an enhancement will affect either the message payload or the way the payload is transferred, but rarely both. Simply put: o An enhancement that affects the content of the message in some way (such as meta-data about how to display the payload, a digital signature ensuring payload integrity, indications of the handling history of the payload, etc.) is best implemented in ways that alter the content somehow, such as addition of header fields or addition of Multipurpose Internet Mail Extensions (MIME) metadata (see [RFC2045]). o An enhancement that purely affects payload transport, and is not meant to be recorded beyond delivery of the message to a mailbox, is best implemented in a way that extend the delivery protocol itself and not in a way that alters the payload. Enhancements that affect both transport and content are rare, and require special attention to this important boundary. 3. Enhancement History As stated above, the email service is primarily composed of two specifications: The format of the payload, and the method by which the payload is transferred from one handling agent to the next. The message format was originally fully specified in [RFC0733], though it has some antecedents in the RFC archive. The current format specificaton is [RFC5322]. This format describes two sections: a "header" and a "body". Generally speaking, the body contains the primary content of the message itself, while the header Kucherawy Expires October 27, 2014 [Page 3] Internet-Draft Enhancing Email April 2014 conveys some metadata such as who sent it, to whom it was sent, where replies are to be directed, how it should be displayed, etc. One notable exception is the Subject: header field, which is essentially part of the content. The Simple Mail Transfer Protocol (SMTP) was originally fully specified in [RFC0821], though it too was based on some other previous work. The current specification is [RFC5321]. The protocol is essentially a simple ASCII dialog between a client system and the server system that exchanges a couple of identifiers -- who the message is "from" and who it is "to" -- and then the message itself, with status codes as the responses at each step. The partition between these two has often been blurred as a result of the original design and implementation of the service. It was simply not always made clear what the best way is to add extensions. A number of enhancements to both of these have appeared over the years, which are too numerous to list here. They range in popularity and deployment. Some of these are enhancements to format (such as the addition of multimedia support), others to the protocol (such as enhanced error handling), and a few have augmented both. The original and increased complexity of the service has led to a body of deployed code that has in turn had some impacts on the development of enhancements over time. This often leads to enhancements that are developed in ways that contradict the advice presented in Section 2. This can have unfortunate consequences, as described below. 4. The Protocol The Simple Mail Transfer Protocol (SMTP) is the language spoken by email clients and servers to exchange messages. The protocol is all in printable ASCII, which makes it easy for users to "speak" the protocol directly for the purposes of testing, debugging, or illustration. Essentially, the client introduces itself to the server, which replies with a similar greeting. The client declares that it has a message from a given party for delivery to one or more parties, followed by a declaration that it is ready to send the content. When the server is ready, the client relays its payload (the message) to the server. Finally, the server accepts the message, usually returning a code to the client that uniquely identifies this transaction so that later analysis of the specific transaction is possible. This sequence can repeat if the client has multiple messages to relay during the same SMTP session. When no more Kucherawy Expires October 27, 2014 [Page 4] Internet-Draft Enhancing Email April 2014 relaying is to be done, the two politely disconnect, and the dialog is complete. One could make the analogy of a person (perhaps a postal worker) speaking to another person (perhaps at home) and the former handing the latter a sealed envelope bearing a sender address and a recipient address. The contents of the envelope are not known to either of these parties at this stage; the exchange does not require it. An important point here is that once the exchange is complete, the first party no longer has the message. This is one of the intentional properties of the email service; the message always exists in exactly one place. The notable exception is the period where transmission of the message is complete but not acknowledged; for that brief period, the message exists in two places. An envelope, in this illustration, can name more than one recipient. An agent holding a message with such an envelope may find it must next relay the message to multiple independent servers to complete delivery to each recipient. In this case, that agent clones ("splits") the envelope, resuting in multiple envelopes each with a subset of the previous recipient set, but with identical content. 5. The Message The email message conveys the content of a message from one or more authors to one or more recipients. The message consists of a header and a body. The body is the primary content, and in modern terms it can contain unstructured plain text, structured multimedia, or nothing at all. The header consists of a set of header fields that include meta-data about the content, such as identifying the party (or parties) that generated it, which agents handled it in transit, the date and time at which it was generated, the (apparent) set of intended recipients, etc. In the case of structured content, the header also contains the initial set of details needed to extract the structure. If one imagines a printed memo, with fields like "From", "To", "Subject", "Date", and perhaps "Cc", it is easy to envision a simple email message; these fields are at the top, separated by some kind of divider (which might be just an extra blank line or two) followed by the body of the memo. It is in this image that the email format was also created. 6. Header vs. Envelope It is useful to carefully distinguish the separation of function of the message header versus the SMTP envelope when considering the Kucherawy Expires October 27, 2014 [Page 5] Internet-Draft Enhancing Email April 2014 design of any enhancement to the email service. There are tradeoffs in the choice of enhancement approach. One tends to gain easier adoption, but has less handling control. The other is much more difficult for adoption, but offers much greater handling control. The most distinctive aspect of the separation is that the addresses in the envelope, used during transfer, can be entirely different from the addresses contained in the message header. So the SMTP return address (MAIL FROM) can be different from the message author (From: header field), and the list of SMTP recipient addresses (RCPT TO) can be entirely different from the recipients listed in the message header (To, Cc, and Bcc header fields). Thus, what's in that example printed memo in the previous section is completely independent of what was on the envelope that contained it. The memo might say "From: Alice" and "To: Bob", while the envelope said "From: Charlie" and "To: Deborah". More generally, there is no guarantee that the content and the transport have any relationship at all. An example of non-core material that is rightly a property of the message and not the envelope includes digital signatures of the payload. One might think of the mark or seal of a notary, which is meant to certify the content and not the envelope containing it. SMTP also has the notion of "Trace Information" which is a record of the agents that handled the message prior to delivery and when they each processed the message. One might think of a premium package handling service that includes tracking as part of its product, showing through which stations the package was carried and a date/ time at each. Email trace information fulfills the same goal, and is normally recorded as Received header fields. Also recorded in the header, at the time of delivery only, is the "from" portion of the envelope, to permit a reply to be sent to the correct place. This is recorded in a field called Return-Path. Any message can be forwarded by a user or a piece of software (such as a mailing list service). In this case, it is appropriate to think of the message as taking on a new life beyond its original delivery; that is, it is delivered to the entity that will forward it, and takes on a new life, with a new envelope and possibly a new or revised header, or even augmented content. Caution must be taken when constructing a new header so that information relevant only to the original delivery does not get forwarded; this leakage of information can lead to mishandling of the content or even leakage of private information to the new recipient(s). [RRVS] provides an example of such risks. Kucherawy Expires October 27, 2014 [Page 6] Internet-Draft Enhancing Email April 2014 7. Deployment Observations and Results As the email service grew in popularity, it also became a popular target for abuse. In particular, it became a vector for delivery of unwanted commercial email ("spam") or even malicious active content ("malware", such as viruses or worms). These attempt to exploit user trust (and naivete) in order to deliver undesirable content. Among other things, false or misleading From and Subject fields on messages are commonplace. Mail User Agents (MUAs) retrieve messages from message stores, and not from the Message Transfer Agents (MTAs) or Message Delivery Agents (MDAs) that affect transport and delivery of messages. They do not have access to the parameters exchanged during the protocol sessions that resulted in the delivery. This led to various enhancements done as message header fields, rather than enhancements to SMTP, or to MUA access protocols such as the Internet Message Access Protocol (IMAP) or Post Office Protocol (POP). The rise in abusive emails, with the abuse almost entirely aimed at exploiting deficiencies in content handling and presentation (see [RFC7103]), produced a requirement for email-handling agents (primarily MTAs) to be enhanced with powerful mechanisms for analyzing and even modifying messages. Given the considerable range of different ad hoc enhancements that have been made to message formats, discussed above, this requires significant flexibility in the mechanisms for making decisions about, or even altering, header fields in messages as they are processed. By contrast, very little in the way of messaging abuse takes place via misuse of SMTP or its extensions. Furthermore, SMTP is the infrastructure mechanism for message handling, and infrastructures are always markedly more difficult to modify, especially when the infrastructure is under a series of independent administrative controls, but must somehow come to be coordinated in their enhancements. This again contrasts with the handling of the payload itself, where only the agent generating the content and the agent that will ultimately intepret it -- by presenting it to a user -- need to understand it. This has resulted in the current environment, in which it is often very easy to add, alter, remove, and analyze header fields on a message, and typically very difficult if not impossible to add or process an SMTP extension for which built-in support does not already exist. An MTA or MDA advertises the SMTP extensions it supports, through the EHLO command reply. A client that supports a particular extension Kucherawy Expires October 27, 2014 [Page 7] Internet-Draft Enhancing Email April 2014 can therefore easily determine its applicability with the server with which it is interacting. If that agent does not include such support, the current agent must decide to do one of two things: a. consider the delivery a failure, and begin processing it as an error; or b. relay the message anyway, losing the capability afforded by the extension. In contrast to this negotiation mechanism at the level of SMTP, there is no control exchange for support of header field enhancements. They are present or not, and the client agent has no way to determine whether its semantics are supported by the next handling agent (or the recipient). However an MTA or MDA that does not understand a particular header field will almost always simply ignore that header field and continue to relay it, usually unmodified, to software downstream that does recognize the field and how to use its contents. A good example of this is MIME, whose header fields are typically of use only to MUAs and are ignored by MTAs and MDAs. Moreover, MUAs typically do not include header fields they don't recognize in the material ultimately presented to the end user. Enhancements done using header fields can be enormously useful when one wishes to deploy a new capability that will not affect or be affected by non-participating agents and is not intended for direct human consumption. As a result, it is common to assume that adding a new capability to the email service is best accomplished by creating (and hopefully, registering) a new header field specific to that purpose, even if that capability would more properly be implemented as an SMTP extension. In a few very rare cases, new capabilities have even been developed that include both header field and SMTP extension forms. [RRVS] again serves as a useful example. 8. Consequences of Faulty Design Using the header for enhancements that do not fit the envelope vs. content model may be convenient given the current deployed environment, but they result in such issues as: o inadvertent leakage of data not relevant to later message recipients if the message gets forwarded; Kucherawy Expires October 27, 2014 [Page 8] Internet-Draft Enhancing Email April 2014 o no guarantee that any agent in the handling path understands the enhancement or the details associated with it, leading to unexpected results; o for messages going to multiple recipients, the possible inadvertent revelation of private information when the message is "fanned out". For more discussion, see Section 7.2 of [RFC5321], Section 3.6.3 of [RFC5322], and Section 7 of [RRVS]. MTA and MDA implementers need to ensure that SMTP extensions can be added and handled via the runtime environment as easily as they can be for header fields. This will ensure the more sound architectural decisions can be made by designers and operators of future enhancements. 9. Security Considerations An important observation is that the envelope and the header overlap in only a small number of key ways: o The Return-Path header field, added at time of delivery, which includes the sender address as extracted from the message envelope; and o The Received header field, which might contain the envelope recipient for messages addressed to a single mailbox. Typically, all other envelope details are discarded upon delivery. Because of this, data about transport that should be ephemeral but are stored in header fields can fall into the wrong hands when the message is forwarded. Following the recommendations above can help to reduce this concern. 10. IANA Considerations This document contains no actions for IANA. [RFC Editor: Please remove this section prior to publication.] 11. References 11.1. Normative References [RFC5598] Crocker, D., "Internet Mail Architecture", RFC 5598, July 2009. Kucherawy Expires October 27, 2014 [Page 9] Internet-Draft Enhancing Email April 2014 11.2. Informative References [RFC0733] Crocker, D., Vittal, J., Pogran, K., and D. Henderson, "Standard for the format of ARPA network text messages", RFC 733, November 1977. [RFC0821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 821, August 1982. [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, October 2008. [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC7103] Kucherawy, M., Shapiro, G., and N. Freed, "Advice for Safe Handling of Malformed Messages", RFC 7103, January 2014. [RRVS] Mills, W. and M. Kucherawy, "The Require-Recipient-Valid- Since Header Field and SMTP Service Extension", draft-ietf-appsawg-rrvs-header-field (work in progress), April 2014. Appendix A. Acknowledgments Dave Crocker and John Levine provided useful review comments during the development of this work. Author's Address Murray S. Kucherawy 270 Upland Drive San Francisco, CA 94127 USA EMail: superuser@gmail.com Kucherawy Expires October 27, 2014 [Page 10]