INTERNET-DRAFT Eric A. Hall Document: draft-hall-mime-app-mbox-00.txt May 2004 Expires: December, 2004 Category: Standards Track The APPLICATION/MBOX Media-Type Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document requests that the application/MBOX media-type be authorized for allocation by IANA, according to the terms specified in RFC 2048 [RFC2048]. Internet Draft draft-hall-mime-app-mbox-00.txt May 2004 1. Background and Overview UNIX and look-alike operating systems have historically made extensive use of "MBOX" mailbox files for a variety of messaging purposes. In the common case, these files are used to hold collections of electronic mail messages which users manipulate as "folders" of a private mail-store. These files are also frequently used by a variety of back-end email services, including delivery servers, filtering systems, and mailing-list programs. Over the last few years, the use of these files has also spread to other operating systems, with a variety of messaging tools on numerous platforms now providing direct access to MBOX files. The increased pervasiveness of these files has led to an increased demand for improvements in cross-system, network-wide interchange of these files. In turn, this requirement also dictates a need for a media-type definition for MBOX files in general. For example, some applications allow users to open MBOX files as discrete data-objects, but use platform- or product-specific mapping techniques to identify these files. Similarly, many mailing list archive programs provide access to MBOX files for historical messages, but will publish these files as text/plain or some other generic media-type, but which causes problematic end- of-line conversions when these files are transferred across a network, or which does not provide for local actions that should be performed against the data (such as prompting the user to import the mailbox data into a local mail-store). The definition of a standard media-type for these files would facilitate a more consistent behavior for these types of actions, and would further the cause of interoperability. Note that this specification does not define the MBOX data file as an authoritative Internet data-type or structure. Instead, it merely seeks to define a standard media-type definition for these files, so that their transfer may be more consistent. 2. Prerequisites and Terminology Readers of this document are expected to be familiar with the specification for MIME registrations (RFC 2048). The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" Hall I-D Expires: December 2004 [page 2] Internet Draft draft-hall-mime-app-mbox-00.txt May 2004 in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. The APPLICATION/MBOX Media-Type Registration Request This section provides the registration request, as per RFC 2048, and which will be submitted to IANA after IESG approval. MIME media type name: application MIME subtype name: MBOX Required parameters: none Optional parameters: none Encoding considerations: MBOX data typically consists of seven-bit ASCII characters in an eight-bit file stream. The data often contains subordinate data which was previously encoded in order to fit within the seven-bit character space. If the content must be further encoded in order to satisfy transfer restrictions, quoted- printable is generally encouraged, since it is likely to introduce the least amount of overhead. Security considerations: MBOX data is passive, and does not generally represent a unique or new security threat. However, there is some risk in sharing any kind of data, in that unintentional information may be exposed, and that risk applies to MBOX data as well. Interoperability considerations: The MBOX file format has a long and rich history on UNIX and UNIX-like platforms. It is also used with many messaging products on non-UNIX platforms, and is also commonly used for intermediary purposes, such as mailing list archives, an intermediary conversion format for private mail- stores, and other messaging-related purposes. The canonical MBOX file format depends on the use of ASCII Line Feed as the end-of-line character, and this usage is typically followed on non-UNIX platforms. Text-based transfer protocols will sometimes make the mistake of converting the Line Feed end-of-line marker into some other sequence that is assumed to be more appropriate for the destination system, although this is often harmful. In all cases, the application/MBOX data MUST be transferred as an opaque eight-bit file stream, with no end-of- line conversion being performed by the transfer protocol. Hall I-D Expires: December 2004 [page 3] Internet Draft draft-hall-mime-app-mbox-00.txt May 2004 Published specification: see Appendix A. Applications which use this media type: scores of messaging products make use of the MBOX file format. Magic number(s): no standard File extension(s): MBOX files sometimes have a ".mbox" extension, but this is not required nor even reasonably expected. Macintosh File Type Code(s): no standard Person & email address to contact for further information: Eric A. Hall (ehall@ntrg.com) Intended usage: COMMON 4. Security Considerations See the discussion in section 3. 5. IANA Considerations After any IESG approval which may be forthcoming, IANA would be expected to register the application/MBOX media-type, using the application provided in section 3 above. 6. Normative References [RFC2048] Freed, N., Klensin, J., Postel, J., "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", BCP 13, RFC 2048, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Appendix A. The Common MBOX Format The MBOX file format is not documented by any authoritative source, but instead only exists as commonly-understood output from historical messaging tools. Partly due to the lack of authoritative documentation, the MBOX file format has been adapted and mutated by various utilities over the years, and does not exist in a form which is syntactically precise. Hall I-D Expires: December 2004 [page 4] Internet Draft draft-hall-mime-app-mbox-00.txt May 2004 MBOX files almost always use the Line Feed character (0x10) as the end-of-line marker. MBOX files usually contain seven-bit character data, but eight-bit data is not uncommon. MBOX files typically contain a sequence of messages, each of which begin with a "From_" line, and which are further separated from their neighboring messages by an empty line that precedes the next "From_" line. This means that the first message in an MBOX file will immediately begin with a "From_" line, while every other message will begin with a "From_" line that is immediately preceded by a Line Feed character. The structure of the "From_" lines vary somewhat, but almost always contain the exact character sequence of "From", followed by whitespace, followed by an email address of some kind, followed by more whitespace, and terminated by a timestamp sequence of some kind. Note that the email address may use any of the forms which have been used throughout history, and the timestamp sequences can also vary according to system preferences. In most cases, the timestamp is followed by an end-of-line signal, but some messaging systems have also been known to append additional information after the timestamp. The exact format of the "From_" line in use with a particular MBOX file can often be determined by examining the first line of the file itself, which will be a "From_" line, and which is easy to locate, although implementers are cautioned that multiple MBOX files may have been joined together, or a single file may have been accessed by multiple clients, resulting in different "From_" line formats being used within a single file. As a result of these variations, implementers are strongly encouraged to fully apply the robustness principle to any MBOX files which are transferred across system lines. In particular, the email address and timestamp sequences are strongly encouraged to conform with the ABNF syntax rules for the Address and Date- Time sequences described in RFC 2822 [RFC2822], although recipients MUST be prepared to receive less-structured sequences. Many implementations are also known to escape body lines beginning with "From ", using a leading Greater Than symbol (0x3E) to break the pattern matching. This is so that excessively-liberal parsers do not misinterpret these sentences as new "From_" lines. However, other implementations are known not to escape such lines unless they also appear to contain an email address and a timestamp, Hall I-D Expires: December 2004 [page 5] Internet Draft draft-hall-mime-app-mbox-00.txt May 2004 while other implementations are known to perform secondary escapes against text which is already escaped or quoted. This issue does not generally affect the transport of MBOX files and is therefore beyond the scope of this document, but implementations should be aware of these considerations. Acknowledgments Funding for the RFC editor function is currently provided by the Internet Society. Authors' Addresses Eric A. Hall ehall@ehsco.com Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Hall I-D Expires: December 2004 [page 6]