APPSAWG M. Kucherawy Internet-Draft G. Shapiro Intended status: Informational June 18, 2013 Expires: December 20, 2013 Advice for Safe Handling of Malformed Messages draft-ietf-appsawg-malformed-mail-06 Abstract Although Internet mail formats have been precisely defined since the 1970s, authoring and handling software often show only mild conformance to the specifications. The distributed and non- interactive nature of email has often prompted adjustments to receiving software, to handle these variations, rather than trying to gain better conformance by senders, since the receiving operator is primarily driven by complaining recipient users and has no authority over the sending side of the system. Processing with such flexibility comes at some cost, since mail software is faced with decisions about whether or not to permit non-conforming messages to continue toward their destinations unaltered, adjust them to conform (possibly at the cost of losing some of the original message), or outright rejecting them. A core requirement for interoperability is that both sides of an exchange work from the same details and semantics. By having receivers be flexible, beyond the specifications, there can be -- and often has been -- a good chance that a message will not be fully interoperable. Worse, a well-established pattern of tolerance for variations can sometimes be used as an attack vector. This document includes a collection of the best advice available regarding a variety of common malformed mail situations, to be used as implementation guidance. It must be emphasized, however, that the intent of this document is not to standardize malformations or otherwise encourage their proliferation. The messages are manifestly malformed, and the code and culture that generates them needs to be fixed. Therefore, these messages should be rejected outright if at all possible. Nevertheless, many malformed messages from otherwise legitimate senders are in circulation and will be for some time, and, unfortunately, commercial reality shows that we cannot always simply reject or discard them. Accordingly, this document presents alternatives for dealing with them in ways that seem to do the least additional harm until the infrastructure is tightened up to match the standards. Status of This Memo Kucherawy & Shapiro Expires December 20, 2013 [Page 1] Internet-Draft Safe Mail Handling June 2013 This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on December 20, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Kucherawy & Shapiro Expires December 20, 2013 [Page 2] Internet-Draft Safe Mail Handling June 2013 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. The Purpose Of This Work . . . . . . . . . . . . . . . . . 4 1.2. Not The Purpose Of This Work . . . . . . . . . . . . . . . 4 1.3. General Considerations . . . . . . . . . . . . . . . . . . 5 2. Document Conventions . . . . . . . . . . . . . . . . . . . . . 5 2.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Invariant Content . . . . . . . . . . . . . . . . . . . . . . 6 5. Mail Submission Agents . . . . . . . . . . . . . . . . . . . . 7 6. Line Terminaton . . . . . . . . . . . . . . . . . . . . . . . 7 7. Header Anomalies . . . . . . . . . . . . . . . . . . . . . . . 8 7.1. Converting Obsolete and Invalid Syntaxes . . . . . . . . . 8 7.1.1. Host-Address Syntax . . . . . . . . . . . . . . . . . 8 7.1.2. Excessive Angle Brackets . . . . . . . . . . . . . . . 8 7.1.3. Unbalanced Angle Brackets . . . . . . . . . . . . . . 9 7.1.4. Unbalanced Parentheses . . . . . . . . . . . . . . . . 9 7.1.5. Commas in Address Lists . . . . . . . . . . . . . . . 9 7.1.6. Unbalanced Quotes . . . . . . . . . . . . . . . . . . 9 7.1.7. Naked Local-Parts . . . . . . . . . . . . . . . . . . 10 7.2. Non-Header Lines . . . . . . . . . . . . . . . . . . . . . 10 7.3. Unusual Spacing . . . . . . . . . . . . . . . . . . . . . 11 7.4. Header Malformations . . . . . . . . . . . . . . . . . . . 12 7.5. Header Field Counts . . . . . . . . . . . . . . . . . . . 13 7.6. Missing Header Fields . . . . . . . . . . . . . . . . . . 14 7.7. Missing or Incorrect Charset Information . . . . . . . . . 14 7.8. Eight-Bit Data . . . . . . . . . . . . . . . . . . . . . . 15 8. MIME Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 16 8.1. Missing MIME-Version Field . . . . . . . . . . . . . . . . 16 8.2. Faulty Encodings . . . . . . . . . . . . . . . . . . . . . 17 9. Body Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 17 9.1. Oversized Lines . . . . . . . . . . . . . . . . . . . . . 17 10. Security Considerations . . . . . . . . . . . . . . . . . . . 18 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 12.1. Normative References . . . . . . . . . . . . . . . . . . . 18 12.2. Informative References . . . . . . . . . . . . . . . . . . 18 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 19 Kucherawy & Shapiro Expires December 20, 2013 [Page 3] Internet-Draft Safe Mail Handling June 2013 1. Introduction 1.1. The Purpose Of This Work The history of email standards, going back to [RFC733] and beyond, contains a fairly rigid evolution of specifications. But implementations within that culture have also long had an undercurrent known formally as the robustness principle, but also known informally as Postel's Law: "Be conservative in what you do, be liberal in what you accept from others." Jon Postel's directive is often misinterpreted to mean that any deviance from a specification is acceptable. Rather, it was intended only to account for legitimate variations in interpretation within specifications, as well as basic transit errors, like bit errors. Taken to its unintended extreme, excessive tolerance would imply that there are no limits to the liberties that a sender might take, while presuming a burden on a receiver to guess "correctly" at the meaning of any such variation. These matters are further compounded by flawed receiver software -- the end users' mail readers -- which are also sometimes flawed, leaving senders to craft messages (sometimes bending the rules) to overcome those flaws. In general, this served the email ecosystem well by allowing a few errors in implementations without obstructing participation in the game. The proverbial bar was set low. However, as we have evolved into the current era, some of these lenient stances have begun to expose opportunities that can be exploited by malefactors. Various email-based applications rely on strong application of these standards for simple security checks, while the very basic building blocks of that infrastructure, intending to be robust, fail utterly to assert those standards. This document presents some areas in which the more lenient stances can provide vectors for attack, and then presents the collected wisdom of numerous applications in and around the email ecosystem for dealing with them to mitigate their impact. 1.2. Not The Purpose Of This Work It is important to understand that this work is not an effort to endorse or standardize certain common malformations. The code and culture that introduces such messages into the mail stream needs to be repaired, as the security penalty now being paid for this lax processing arguably outweighs the reduction in support costs to end users who are not expected to understand the standards. However, the reality is that this will not be fixed quickly. Kucherawy & Shapiro Expires December 20, 2013 [Page 4] Internet-Draft Safe Mail Handling June 2013 Given this, it is beneficial to provide implementers with guidance about the safest or most effective way to handle malformed messages when they arrive, taking into consideration the tradeoffs of the choices available especially with respect to how various actors in the email ecosystem respond to such messages in terms of handling, parsing, or rendering to end users. 1.3. General Considerations Many deviations from message format standards are considered by some receivers to be strong indications that the message is undesirable, i.e., is spam or contains malware. Such receivers quickly decide that the best handling choice is simply to reject or discard the message. This means malformations caused by innocent misunderstandings or ignorance of proper syntax can cause messages with no ill intent also to fail to be delivered. Senders that want to ensure message delivery are best advised to adhere strictly to the relevant standards (including, but not limited to, [MAIL], [MIME], and [DKIM]), as well as observe other industry best practices such as may be published from time to time either by the IETF or independently. Receivers that haven't the luxury of strict enforcement of the standards on inbound messages are usually best served by observing the following guidelines for handling of malformed messages: 1. Whenever possible, mitigation of syntactic malformations should be guided by an assessment of the most likely semantic intent. For example, it is reasonable to conclude that multiple sets of angle brackets around an address are simply superflous and can be dropped. 2. When the intent is unclear, or when it is clear but also impractical to change the content to reflect that intent, mitigation should be limited to cases where not taking any corrective action would clearly lead to a worse outcome. 3. Security issues, when present, need to be addressed and may force mitigation strategies that are otherwise suboptimal. 2. Document Conventions 2.1. Examples Examples of message content include a number within braces at the end of each line. These are line numbers for use in subsequent discussion, and are not actually part of the message content Kucherawy & Shapiro Expires December 20, 2013 [Page 5] Internet-Draft Safe Mail Handling June 2013 presented in the example. Blank lines are not numbered in the examples. 3. Background The reader would benefit from reading [EMAIL-ARCH] for some general background about the overall email architecture. Of particular interest is the Internet Message Format, detailed in [MAIL]. Throughout this document, the use of the term "message" should be assumed to mean a block of text conforming to the Internet Message Format. 4. Invariant Content An agent handling a message could use several distinct representations of the message. One is an internal representation, such as separate blocks of storage for the header and body, some header or body alterations, or tables indexed by header name, set up to make particular kinds of processing easier. The other is the representation passed along to the next agent in the handling chain. This might be identical to the message input to the module, or it might have some changes such as added or reordered header fields or body elisions to remove malicious content. Message handling is usually most effective when each in a sequence of handling modules receives the same content for analysis. A module that "fixes" or otherwise alters the content passed to later modules can prevent the later modules from identifing malicious or other content that exposes the end user to harm. It is important that all processing modules can make consistent assertions about the content. Modules that operate sequentially sometimes add private header fields to relay information downstream for later filters to use (and possibly remove), or they may have out-of-band ways of doing so. Whenever possible, the latter mechanism should be used. The above is less of a concern when multiple analysis modules are operated in parallel, independent of one another. Often, abuse reporting systems can act effectively only when a complaint or report contains the original message exactly as it was generated. Messages that have been altered by handling modules might render a complaint inactionable as the system receiving the report may be unable to identify the original message as one of its own. Some message changes alter syntax without changing semantics. (Indeed, analyzing the semantics of malformations was the impetus for this work.) For example, Section 7.4 describes a situation where an Kucherawy & Shapiro Expires December 20, 2013 [Page 6] Internet-Draft Safe Mail Handling June 2013 agent removes additional header whitespace. This is a syntax change without a change in semantics, though some systems (e.g., DKIM) are sensitive to such changes. Message system developers need to aware of the downstream impact of making either kind of change. There will always be local handling exceptions, but these guidelines should be useful for developing integrated message processing environments. In most cases, this document only discusses techniques used on internal representations. It is occasionally necessary to make changes between the input and output versions; such cases will be called out explicitly. 5. Mail Submission Agents Within the email context, the single most influential component that can reduce the presence of malformed items in the email system is the Mail Handling Service (MHS; see [EMAIL-ARCH]). This is the component that is essentially the interface between end users that create content and the mail stream. MHSes need to become more strict about enforcement of all relevant email standards, especially [MAIL] and the [MIME] family of documents. More strict conformance by relaying MTAs will also be helpful. although preventing the dissemination of malformed messages is desirable, the rejection of such mail already in transit also has a support cost, namely the creation of a [DSN] that many end users might not understand. 6. Line Terminaton For interoperable Internet Mail messages, the only valid line separation sequence in messaging is ASCII 0x0D ("carriage return", or CR) followed by ASCII 0x0A ("line feed", or LF), commonly referred to as CRLF. Common UNIX user tools, however, typically only use LF for internal line termination. This means that a protocol engine, which converts between UNIX and Internet Mail formats, has to convert between these two end-of-line representations before transmitting a message or after receiving it. Non-compliant implementations can cause messages to be transmitted with a mix of line terminations, such as LF everywhere except CRLF only at the end of the message. According to [SMTP] and [MAIL], this means the entire message actually exists on a single line. Kucherawy & Shapiro Expires December 20, 2013 [Page 7] Internet-Draft Safe Mail Handling June 2013 Within modern Internet Mail it is highly unlikely that an isolated CR or LF is valid in common ASCII text. Furthermore [MIME] presents mechanisms for encoding content that actually does need to contain such an unusual character sequence. Thus, it will typically be safe and helpful to treat a naked CR or LF as equivalent to a CRLF when parsing a message. 7. Header Anomalies This section covers common syntactical and semantic anomalies found in a message header, and presents preferred mitigations. 7.1. Converting Obsolete and Invalid Syntaxes A message using an obsolete header syntax might confound an agent that is attempting to be robust in its handling of syntax variations. A bad actor could exploit such a weakness in order to get abuse or malicious content through a filter. This section presents some examples of such variations. Messages including them ought be rejected; where this is not possible, recommended internal interpretations are provided. 7.1.1. Host-Address Syntax The following obsolete syntax attempts to specify source routing: To: <@example.net:fran@example.com> This means "send to fran@example.com via the mail service at example.net". It can safely be interpreted as: To: 7.1.2. Excessive Angle Brackets The following over-use of angle brackets, e.g.: To: <<>> can safely be interpreted as: To: Kucherawy & Shapiro Expires December 20, 2013 [Page 8] Internet-Draft Safe Mail Handling June 2013 7.1.3. Unbalanced Angle Brackets The following use of unbalanced angle brackets: To: can usually be treated as: To: To: second@example.org 7.1.4. Unbalanced Parentheses The following use of unbalanced parentheses: To: (Testing To: Testing) should be interpreted as: To: (Testing) To: "Testing)" 7.1.5. Commas in Address Lists This use of an errant comma: To: can reasonably be interpreted as ending an address, so the above should really be interpreted as: To: third@example.net, fourth@example.net 7.1.6. Unbalanced Quotes The following use of unbalanced quotation marks: To: "Joe leaves software with no obvious "good" interpretation. If it is essential to extract an address from the above, one possible interpretation is: To: "Joe "@example.net where "example.net" is the domain name or host name of the handling Kucherawy & Shapiro Expires December 20, 2013 [Page 9] Internet-Draft Safe Mail Handling June 2013 agent making the interpretation. Another possible interpretation is simply: To: "Joe"