Internet DRAFT - draft-hall-mime-app-mbox

draft-hall-mime-app-mbox





  INTERNET-DRAFT                                             Eric A. Hall 
  Document: draft-hall-mime-app-mbox-04.txt                 February 2005 
  Expires: August, 2005                                                   
  Category: Standards-Track                                               
      
      
                       The APPLICATION/MBOX Media-Type 
      
     Status of this Memo 
      
     By submitting this Internet-Draft, I certify that any applicable 
     patent or other IPR claims of which I am aware have been 
     disclosed, and any of which I become aware will be disclosed, in 
     accordance with RFC 3668. 
      
     Internet-Drafts are working documents of the Internet Engineering 
     Task Force (IETF), its areas, and its working groups. Note that 
     other groups may also distribute working documents as Internet-
     Drafts. 
      
     Internet-Drafts are draft documents valid for a maximum of six 
     months and may be updated, replaced, or obsoleted by other 
     documents at any time. It is inappropriate to use Internet-Drafts 
     as reference material or to cite them other than as "work in 
     progress." 
      
     The list of current Internet-Drafts can be accessed at 
     http://www.ietf.org/ietf/1id-abstracts.txt. 
      
     The list of Internet-Draft Shadow Directories can be accessed at 
     http://www.ietf.org/shadow.html. 
      
      
     Copyright Notice 
      
     Copyright (C) The Internet Society (2004). All Rights Reserved. 
      
      
     Abstract 
      
     This memo requests that the application/mbox media-type be 
     authorized for allocation by the IESG, according to the terms 
     specified in RFC 2048 [RFC2048]. This memo also defines a default 
     format for the mbox database, which must be supported by all 
     conformant implementations. 
      
   
   
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
      
  1.      Background and Overview 
      
     UNIX-like operating systems have historically made widespread use 
     of "mbox" database files for a variety of local email purposes. In 
     the common case, mbox files store linear sequences of one or more 
     electronic mail messages, with local email clients treating the 
     database as a logical folder of email messages. mbox databases are 
     also used by a variety of other messaging tools, such as mailing 
     list management programs, archiving and filtering utilities, 
     messaging servers, and other related applications. In recent 
     years, mbox databases have also become common on a large number of 
     non-UNIX computing platforms, for similar kinds of purposes. 
      
     The increased pervasiveness of these files has led to an increased 
     demand for a standardized, network-wide interchange of these files 
     as discrete database objects. In turn, this dictates a need for a 
     media-type definition for mbox files in general, which is the 
     subject and purpose of this memo. 
      
  2.      About the mbox Database 
      
     The mbox database format is not documented in an authoritative 
     specification, but instead exists as a well-known output format 
     that is anecdotally documented, or which is only authoritatively 
     documented for a specific platform or tool. 
      
     mbox databases typically contain a linear sequence of electronic 
     mail messages. Each message begins with a separator line that 
     identifies the message sender, and also identifies the date and 
     time at which the message was received by the final recipient 
     (either the last-hop system in the transfer path, or the system 
     which serves as the recipient's mailstore). Each message is 
     typically terminated by an empty line. The end of the database is 
     usually recognized by either the absence of any additional data, 
     or by the presence of an explicit end-of-file marker. 
      
     The structure of the separator lines vary across implementations, 
     but usually contain the exact character sequence of "From", 
     followed by a single Space character (0x20), an email address of 
     some kind, another Space character, a timestamp sequence of some 
     kind, and an end-of-line marker. However, due to the lack of any 
     authoritative specification, each of these attributes are known to 
     vary widely across implementations. For example, the email address 
     can reflect any addressing syntax which has ever been used on any 
     messaging system in all of history (specifically including address 
   
  Hall                   I-D Expires: August 2005              [page 2] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
     forms which are not compatible with Internet messages, as defined 
     by RFC 2822 [RFC2822]). Similarly, the timestamp sequences can 
     also vary according to system output, while the end-of-line 
     sequences will often reflect platform-specific requirements. 
     Different data formats can even appear within a single database as 
     a result of multiple mbox files being concatenated together, or 
     because a single file was accessed by multiple messaging clients 
     which have each used their own syntax for the separator line. 
      
     Message data within mbox databases often reflects site-specific 
     peculiarities. For example, it is entirely possible for the 
     message body or headers in an mbox database to contain untagged 
     eight-bit character data that implicitly reflects a site-specific 
     default language or locale, or for timestamps and email addresses 
     to reflect local defaults, with none of this data being widely 
     portable beyond the local scope. Similarly, message data can also 
     contain unencoded eight-bit binary data, or can use encoding 
     formats which represent a specific platform (E.G., BINHEX or 
     UUENCODE sequences). 
      
     Many implementations are also known to escape message body lines 
     that begin with the character sequence of "From ", so as to 
     prevent confusion with overly-liberal parsers that do not search 
     for full separator lines. In the common case, a leading Greater-
     Than symbol (0x3E) is used for this purpose (with "From " becoming 
     ">From "). However, other implementations are known not to escape 
     such lines unless they are immediately preceded by a blank line or 
     if they also appear to contain an email address and a timestamp. 
     Other implementations are also known to perform secondary escapes 
     against these lines if they are already escaped or quoted, while 
     others ignore these mechanisms altogether. 
      
     A comprehensive description of mbox database files on UNIX-like 
     systems can be found at http://qmail.org./man/man5/mbox.html, 
     which should be treated as mostly authoritative for those 
     variations which are otherwise only documented in anecdotal form. 
     However, readers are advised that many other platforms and tools 
     make use of mbox databases, and that there are many more potential 
     variations that can be encountered in the wild. 
      
     In order to mitigate errors that may arise from such vagaries, 
     this specification defines a "format" parameter to the 
     APPLICATION/MBOX media-type declaration, which can be used to 
     identify the specific kind of mbox database that is being 
     transferred. Furthermore, this specification defines a "default" 
     database format which MUST be supported by implementations that 
   
  Hall                   I-D Expires: August 2005              [page 3] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
     claim to be compliant with this specification, and which is to be 
     used as the implicit format for undeclared APPLICATION/MBOX data 
     objects. Additional format types are to be defined in subsequent 
     specifications. Messaging systems which receive a mbox database 
     with an unknown format parameter value SHOULD treat the data as an 
     opaque binary object, as if the data had been declared as 
     APPLICATION/OCTET-STREAM. 
      
     Refer to Appendix A for a description of the default mbox format. 
      
     Note that RFC 2046 [RFC2046] defines the multipart/digest media-
     type for transferring platform-independent message files. Since 
     that specification defines a set of neutral and strict formatting 
     rules, the multipart/digest media-type already facilitates highly-
     predictable transfer and conversion operations, and as such 
     implementers are strongly encouraged to support and use that 
     media-type where possible. 
      
  3.      Prerequisites and Terminology 
      
     Readers of this document are expected to be familiar with the 
     specification for MIME [RFC2045] and MIME-type registrations 
     [RFC2048]. 
      
     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 
     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 
     in this document are to be interpreted as described in RFC 2119 
     [RFC2119]. 
      
  4.      The APPLICATION/MBOX Media-Type Registration 
      
     This section provides the media-type registration application (as 
     per [RFC2048]), which will be submitted to IANA after IESG 
     approval of this specification. 
      
     MIME media type name: application 
      
     MIME subtype name: mbox 
      
     Required parameters: none 
      
     Optional parameters: The "format" parameter identifies the format 
     of the mbox database and the messages contained therein. The 
     default value for the "format" parameter is "default", and refers 
     to the formatting rules defined in Appendix A of this memo. mbox 
     databases that do not have a "format" parameter SHOULD be 
   
  Hall                   I-D Expires: August 2005              [page 4] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
     interpreted as having the implicit "format" value of "default". 
     mbox databases that have an unknown value for the "format" 
     parameter SHOULD be treated as opaque data objects, as if the 
     media-type had been specified as APPLICATION/OCTET-STREAM. 
     Additional values for the format parameter are to be defined in 
     subsequent specifications, and registered with IANA. 
      
     Encoding considerations: If an email client receives an mbox 
     database as a message attachment, and then stores that attachment 
     within a local mbox database, the contents of the two database 
     files may become irreversibly intermingled, such that neither 
     database is no longer independently recognizable. In order to 
     avoid these collisions, messaging systems which support this 
     specification MUST encode an mbox database (or at a minimum, the 
     separator lines) with a non-transparent transfer encoding (such as 
     BASE64 or Quoted-Printable) whenever an APPLICATION/MBOX object is 
     transferred via messaging protocols. Other transfer services are 
     generally encouraged to adopt similar encoding strategies to allow 
     for any subsequent retransmission which might occur, but are not 
     explicitly required to do so. Implementers should also be prepared 
     to encode mbox data locally if non-compliant data is received. 
      
     Security considerations: mbox data is passive, and does not 
     generally represent a unique or new security threat. However, 
     there is risk in sharing any kind of data, in that unintentional 
     information may be exposed, and this risk certainly applies to 
     mbox data as well. 
      
     Interoperability considerations: Due to the lack of a single 
     authoritative specification for mbox databases, there are a large 
     number of variations between database formats (refer to the 
     introduction text for common examples), and it is expected that 
     non-conformant data will be erroneously tagged or exchanged. 
     Although the "default" format specified in this memo does not 
     allow for these kinds of vagaries, prior negotiation or agreement 
     between humans may sometimes be needed. 
      
     Published specification: see Appendix A. 
      
     Applications which use this media type: hundreds of messaging 
     products make use of the mbox database format, in one form or 
     another. 
      
     Magic number(s): mbox database files can be recognized by having a 
     leading character sequence of "From", followed by a single Space 
     character (0x20), followed by additional printable character data 
   
  Hall                   I-D Expires: August 2005              [page 5] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
     (refer to the description in Appendix A for details). However, 
     implementers are cautioned that all such files will not be 
     compliant with all of the formatting rules, so implementers should 
     treat these files with an appropriate amount of circumspection. 
      
     File extension(s): mbox database files sometimes have an ".mbox" 
     extension, but this is not required nor expected. As with magic 
     numbers, implementers should avoid reflexive assumptions about the 
     contents of such files. 
      
     Macintosh File Type Code(s): None are known to be common. 
      
     Person & email address to contact for further information: Eric A. 
     Hall (ehall@ntrg.com) 
      
     Intended usage: COMMON 
      
  5.      Security Considerations 
      
     See the discussion in section 4. 
      
  6.      IANA Considerations 
      
     Upon IESG approval, IANA would be expected to register the 
     APPLICATION/MBOX media-type in the MIME registry, using the 
     application provided in section 4 above. 
      
     Furthermore, IANA would be expected to establish and maintain a 
     registry of values for the "format" parameter as described in this 
     memo. The first registration would be the "default" value, using 
     the description provided in Appendix A. Subsequent values for the 
     "format" parameter MUST be accompanied by some form of 
     recognizable, complete, and legitimate specification, such as an 
     IESG-approved specification. or some kind of authoritative vendor 
     documentation. 
      
  7.      Normative References 
      
          [RFC2046]     Freed, N., Borenstein, N., "Multipurpose 
                         Internet Mail Extensions (MIME) Part Two: 
                         Media Types", RFC 2046, November 1996. 
      
          [RFC2048]     Freed, N., Klensin, J., Postel, J., 
                         "Multipurpose Internet Mail Extensions (MIME) 
                         Part Four: Registration Procedures", BCP 13, 
                         RFC 2048, November 1996. 
      
   
  Hall                   I-D Expires: August 2005              [page 6] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
          [RFC2119]     Bradner, S., "Key words for use in RFCs to 
                         Indicate Requirement Levels", BCP 14, RFC 
                         2119, March 1997. 
      
          [RFC2822]     Resnick, P., "Internet Message Format", RFC 
                         2822, April 2001. 
      
      
  Appendix A.    The "default" mbox Database Format 
      
     In order to improve interoperability among messaging systems, this 
     memo defines a "default" mbox database format, which MUST be 
     supported by all implementations claiming to be compliant with 
     this specification. 
      
     The "default" mbox database format uses a linear sequence of 
     Internet messages, with each message being immediately prefaced by 
     a separator line, and being terminated by an empty line. More 
     specifically: 
      
        o Each message within the database MUST follow the syntax and 
          formatting rules defined in RFC 2822 [RFC2822] and its 
          related specifications, with the exception that the canonical 
          mbox database MUST use a single Line-Feed character (0x0A) as 
          the end-of-line sequence, and MUST NOT use a Carriage-
          Return/Line-Feed pair (NB: this requirement only applies to 
          the canonical mbox database as transferred, and does not 
          override any other specifications). This usage represents the 
          most common historical representation of the mbox database 
          format, and allows for the least amount of conversion. 
      
        o Messages within the default mbox database MUST consist of 
          seven-bit characters within an eight-bit stream. Eight-bit 
          data within the stream MUST be converted to a seven-bit form 
          (using an appropriate, standardized encoding) and 
          appropriately tagged (with the correct header fields) before 
          the database is transferred. 
      
        o Message headers and data in the default mbox database MUST be 
          fully-qualified, as per the relevant specification[s]. For 
          example, email addresses in the various header fields MUST 
          have legitimate domain names (as per RFC 2822), while 
          extended characters and encodings MUST be specified in the 
          appropriate location (as per the appropriate MIME 
          specifications), and so forth. 
      
   
  Hall                   I-D Expires: August 2005              [page 7] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
        o Each message in the mbox database MUST be immediately 
          preceded by a single separator line, which MUST conform to 
          the following syntax: 
      
             The exact character sequence of "From"; 
      
             a single Space character (0x20); 
      
             the email address of the message sender (as obtained from 
             the message envelope or other authoritative source), 
             conformant with the "addr-spec" syntax from RFC 2822; 
      
             a single Space character; 
      
             a timestamp indicating the UTC date and time when the 
             message was originally received, conformant with the 
             syntax of the traditional UNIX 'ctime' output sans 
             timezone (note that the use of UTC precludes the need for 
             a timezone indicator); 
      
             an end-of-line marker. 
      
        o Each message in the database MUST be terminated by an empty 
          line, containing a single end-of-line marker. 
      
     Note that the first message in an mbox database will only be 
     prefaced by a separator line, while every other message will begin 
     with two end-of-line sequences (one at the end of the message 
     itself, and another to mark the end of the message within the mbox 
     database file stream) and a separator line (marking the new 
     message). The end of the database is implicitly reached when no 
     more message data or separator lines are found. 
      
     Also note that this specification does not prescribe any escape 
     syntax for message body lines that begin with the character 
     sequence of "From ". Recipient systems are expected to parse full 
     separator lines as they are documented above. 
      
      
  Acknowledgments 
      
     Funding for the RFC editor function is currently provided by the 
     Internet Society. 
      
      
   
  Hall                  I-D Expires: December 2004             [page 8] 
  Internet Draft     draft-hall-mime-app-mbox-04.txt     February 2005 
   
   
  Authors' Addresses 
      
     Eric A. Hall 
     ehall@ntrg.com 
      
      
  Full Copyright Statement 
      
     Copyright (C) The Internet Society 2004. This document is subject 
     to the rights, licenses and restrictions contained in BCP 78, and  
     except as set forth therein, the authors retain all their rights. 
      
     This document and the information contained herein are provided on 
     an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 
     REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND 
     THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, 
     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT 
     THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR 
     ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A 
     PARTICULAR PURPOSE. 
      
   
  Hall                  I-D Expires: December 2004             [page 9]