Network Working Group Jutta Degener Internet Draft Sendmail, Inc. Expires: July 2004 Jan 2004 Sieve -- "body" extension Status of this memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document defines a new primitive for the "sieve" language that tests for the occurrence of one or more strings in the body of an e-mail message. 1. Introduction The proposed "body" test checks for the occurrence of one or more strings in the body of an e-mail message. Such a test was initially discussed for the [SIEVE] base document, but was subsequently removed because it was thought to be too costly to implement. Nevertheless, several server vendors have implemented some form of the "body" test. This document reintroduces the "body" test as an extension, and specifies it syntax and semantics. 2. Conventions used. Conventions for notations are as in [SIEVE] section 1.1, including use of [KEYWORDS] and "Syntax:" label for the definition of action and tagged arguments syntax. The capability string associated with extension defined in this document is "body". 3. Test body Syntax: "body" [COMPARATOR] [MATCH-TYPE] [BODY-TRANSFORM] The body test matches text in the body of an e-mail message, that is, anything following the first empty line after the header. (The empty line itself, if present, is not considered to be part of the body.) The COMPARATOR and MATCH-TYPE keyword parameters are defined in [SIEVE]. The BODY-TRANSFORM is a keyword parameter discussed in section 4, below. If a message consists of a header only, not followed by an empty line, all "body" tests fail, including that for an empty string. If a message consists of a header followed only by an empty line with no body lines following it, the message is considered to have an empty string as a body. 4. Body Transform Prior to matching text in a message body, "transformations" can be applied that filter and decode certain parts of the body. These transformations are selected by a "BODY-TRANSFORM" keyword parameter. Syntax: ":raw" / ":content" / ":binary" [:offset ] / ":text" The default transformation is :text. 4.1 Body Transform ":raw" The ":raw" transform is intended to match against the undecoded body of a message. If the specified body-transform is ":raw", the [MIME] structure of the body is irrelevant. The implementation MUST NOT remove any transfer encoding from the message, MUST NOT refuse to filter messages with syntactic errors (unless the environment it is part of rejects them outright), and MUST NOT interpret or skip MIME headers of enclosed body parts. Example: require "body"; # This will match a message containing the words "MAKE MONEY FAST" # in body or MIME headers other than the outermost RFC 822 header, # but will not match a message containing the words in a # content-transfer-encoded body. if body :raw :contains "MAKE MONEY FAST" { reject; } 4.2 Body Transform ":content" If the body transform is ":content", only MIME parts that have the specified content-types are selected for matching. If an individual content type contains a '/' (slash), it specifies a full / pair, and matches only that specific content type. If it is the empty string, all MIME content types are matched. Otherwise, it specifies a only, and any subtype of that type matches it. The search for MIME parts matching the :content specification is recursive and automatically descends into multipart and message/rfc822 MIME parts. Once a MIME part has been identified as suitable for searching, only its direct contents are searched for the key strings. For example, a document with "multipart" major content type only directly contains the text in its epilogue and prologue section; all the user-visible data inside it is directly contained in documents with MIME types other than multipart. (Nevertheless, matches against container types with an empty match string can be useful as tests for the existence of such document parts.) MIME parts encoded in "quoted-printable" or "base64" content transfer encodings MUST be decoded to prior to the match. MIME parts in other transfer encodings MAY be decoded, omitted from the test, or processed as raw data. MIME parts identified as using charsets other than UTF-8 as defined in [UTF-8] SHOULD be converted to UTF-8 prior to the match. A conversion from US-ASCII to UTF-8 MUST be supported. If an implementation does not support conversion of a given charset to UTF-8, it MAY compare against the US-ASCII subset of the transfer-decoded character data instead. Characters from documents tagged with charsets that the local implementation cannot convert to UTF-8 and text from mistagged documents MAY be omitted or processed according to local conventions. Search expressions MUST NOT match across MIME part boundaries. MIME headers of the containing text MUST NOT be included in the data. Example: require ["body", "fileinto"]; # Save any message with any text MIME part that contains the # worlds "missile" or "coordinates" in the "secrets" folder. if body :content "text" :contains ["missile", "coordinates"] { fileinto "secrets"; } # Save any message with an audio/mp3 MIME part in # the "jukebox" folder. if body :content "audio/mp3" :contains "" { fileinto "jukebox"; } 4.3 Body Transform ":text" The ":text" body transform matches against the results of an implementation's best effort at extracting UTF-8 encoded text from a message. In simple implementations, :text MAY be treated the same as :content "text". Sophisticated implementations MAY strip mark-up from the text prior to matching, and MAY convert media types other than text to text prior to matching. (For example, they may be able to convert proprietary text editor formats to text or apply optical character recognition algorithms to image data.) 4.4 Body Transform ":binary" If the body transform is ":binary", the rules for selecting MIME body parts for matching are the same as with the ":content" body transform. MIME parts encoded in "quoted-printable" or "base64" content transfer encodings MUST be decoded prior to the match. MIME parts in other transfer encodings MAY be decoded, omitted from the test, or processed as raw data. Unlike in :content, the charset of the :binary MIME content is disregarded. Instead, the match against the keys provided in the "body" statement proceeds as if the file's content data had been translated into space-separated hex bytes of the form [0-9a-f][0-9a-f] prior to matching. Search expressions MUST NOT match across MIME part boundaries. MIME headers of the containing text MUST NOT be included in the data. If the optional ":offset " is provided, the binary match is executed after skipping octets of the binary data. (Note that the offset counts bytes of the internal data, not characters of the hexadecimal representation.) Example: require ["body", "fileinto"]; # Save any message with any application MIME part that contains # an ascii C string representation of "Hello, World!" into the # "helloworld" folder. if body :binary ["application"] :contains "48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 00" { fileinto "helloworld"; stop; } # Check if the bytes at offsets 1000...1003 match some fictitious # signature 44, 3c, 0, 1; if yes, reject the message. if body :binary ["application"] :offset 1000 :matches "44 3c 00 01 *" { reject "example virus detected"; stop; } 5. Interaction with Other Sieve Extensions Any extension that extends the grammar for the COMPARATOR or MATCH-TYPE nonterminals will also affect the implementation of "body". The [REGEX] extension can place a considerable load on a system when applied to whole bodies of messages, especially when implemented naively or used maliciously. 6. Security Considerations The system MUST be sized and restricted in such a manner that even malicious use of body matching does not deny service to other users of the host system. Filters relying on string matches in the raw body of an e-mail message may be more general than intended. Text matches are no replacement for a virus or spam filtering system. 7. Acknowledgments This document has been revised in part based on comments and discussions that took place on and off the SIEVE mailing list. Thanks to Cyrus Daboo, Ned Freed, Simon Josefsson, Chris Markle, Greg Shapiro, Tim Showalter, Nigel Swinson, and Dowson Tong for reviews and suggestions. 8. Author's Address Jutta Degener Sendmail, Inc. 6425 Christie Ave, 4th Floor Emeryville, CA 94608 Email: jutta@sendmail.com 9. Discussion This section will be removed when this document leaves the Internet-Draft stage. This draft is intended as an extension to the Sieve mail filtering language. Sieve extensions are discussed on the MTA Filters mailing list at . Subscription requests can be sent to (send an email message with the word "subscribe" in the body). More information on the mailing list along with a WWW archive of back messages is available at . 9.1 Changes from the previous version Added body transform ":binary". Define the meaning of an empty content type string. Appendices Appendix A. References [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [SIEVE] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028, January 2001. [UTF-8] Yergeau, F., "UTF-8, a transformation format of Unicode and ISO 10646", RFC 2044, October 1996. Appendix B. Full Copyright Statement Copyright (C) The Internet Society 2002,2003. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.