Network Working Group S. Leonard Internet-Draft Penango, Inc. Updates: 2045, 6838 (if approved) M. Kerwin Intended status: Informational Expires: November 7, 2015 May 6, 2015 The Archive Top-Level Media Type for File Archives draft-seantek-kerwin-arcmedia-type-01 Abstract This document defines a new top-level media type to be known as "archive", which defines a fundamental type of media with unique presentational, hardware, and processing aspects. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 7, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Leonard & Kerwin Expires November 7, 2015 [Page 1] Internet-Draft arcmedia May 2015 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 2 2. Definition of an archive . . . . . . . . . . . . . . . . . . 2 3. Encoding and Transport . . . . . . . . . . . . . . . . . . . 3 4. Registration Template . . . . . . . . . . . . . . . . . . . . 3 5. Common Required and Optional Parameters . . . . . . . . . . . 5 6. Split Archives . . . . . . . . . . . . . . . . . . . . . . . 6 7. Fragment Identifier Syntax . . . . . . . . . . . . . . . . . 6 8. Piped-Composite Type Suffix Syntax . . . . . . . . . . . . . 7 9. Security Considerations . . . . . . . . . . . . . . . . . . . 7 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 10.1. Normative References . . . . . . . . . . . . . . . . . . 7 10.2. Informative References . . . . . . . . . . . . . . . . . 7 Appendix A. Expected Subtypes . . . . . . . . . . . . . . . . . 8 Appendix B. Change Log . . . . . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction The purpose of this memo is to update [RFC2045] and [RFC6838] to include a new top-level media type to be known as "archive". [RFC6838] describes mechanisms for specifying and describing the format of Internet Message Bodies via media type/subtype pairs. "archive" defines a fundamental type of media with unique presentational, hardware, and processing aspects. Various subtypes of this top-level type are immediately anticipated, and will be covered under separate documents. 1.1. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Definition of an archive The archive top-level media type identifies a container of one or more data objects and metadata about them. Archives are used to collect multiple data objects together into a single object for easier portability and storage. Archive formats can provide many optional services, including: 1. compression 2. encryption Leonard & Kerwin Expires November 7, 2015 [Page 2] Internet-Draft arcmedia May 2015 3. authentication 4. backup and restoration 5. filesystem imaging 6. software packaging and distribution 7. volume-splitting (archive split into multiple objects) 8. block storage Formats and techniques that support one or more of these services already exist under separate registrations. For example, the Content-Encoding header can be used to signal compressed Internet message content. The distinguishing feature of the archive top-level type is that these services are integrated into the format itself, along with the inclusion of object-specific metadata. Formats contemplated under this top-level type are designed to concatenate multiple objects into a single data stream, along with names and other metadata. When an Internet-facing application handles content labeled with this type, it SHOULD treat the archive as a discrete data item. For example, an Internet mail user agent might display an archive-labeled type with an archive icon, possibly with a preview of the objects contained therein, as opposed to automatically extracting its contents. 3. Encoding and Transport Unrecognized subtypes of archive SHOULD be treated as "archive/file". Like "application/octet-stream", the purpose of the "archive/file" type is to provide default handling; it does not represent a particular archive format. Implementations SHOULD defer handling of unrecognized subtypes of archive to a robust general-purpose archive processing application, if such an application is available. If default archive handling is not supported, the archive MAY be treated as if it were "application/octet-stream". Unless noted in the subtype registration, subtypes of archive MUST be assumed to contain binary data, implying the use of base64 content encoding for email and binary transfer for ftp and http. 4. Registration Template The formal syntax for the subtypes of the archive top-level type SHOULD look like this: Leonard & Kerwin Expires November 7, 2015 [Page 3] Internet-Draft arcmedia May 2015 Type name: archive Subtype name: xxxxxxxx Required parameters: none Optional parameters: TBD Encoding considerations: base64 encoding is recommended when transmitting archive/* documents through MIME electronic mail. Security considerations: see Section 9 below Published specification: TBD Applications that use this media type: TBD Fragment identifier considerations: The considerations of this document, plus any extra syntaxes not inconsistent with this document. Additional information: Deprecated alias names for this type: (Include non-archive alias names, such as those in application.) Magic number(s): TBD File extension(s): TBD Macintosh file type code(s): TBD See Appendix A for references to some of the expected subtypes. Person and email address to contact for further information: TBD Intended usage: Leonard & Kerwin Expires November 7, 2015 [Page 4] Internet-Draft arcmedia May 2015 TBD (COMMON will be the most common) Restrictions on usage: TBD Author: TBD Change controller: TBD Provisional registration? (standards tree only): (Yes/No) (Any other information that the author deems interesting may be added below this line.) The optional parameters consist of starting conditions and variable values used as part of the subtypes. 5. Common Required and Optional Parameters Archive formats usually include parameteric meta-data within the format. Consequently, subtypes of archive SHOULD NOT specify the same information as parameters to the type. Some archive formats are very old, or are designed to be backwards- compatible with older formats, and as such might not have been designed with transport across the Internet in mind. For example, modern versions of the ZIP file format [ZIP] include support for the Universal Character Set [ISO10646], however the default encoding of filenames within a ZIP archive has always been Code Page 437 [CP437]. Due to the historical nature of archives, and to support interoperability with older implementations, sometimes it is preferable to communicate the archive as-is, rather than updating it to a more modern or universal format. Implementations that are archive-type aware MUST support the following parameters for maximum compatibility. At the same time, new archive types SHOULD NOT rely on these parameters for disambiguation; new archive types SHOULD be designed in such a way that "universal" interoperability is achieved using information contained within the archive format itself. [[TODO: write this list]] Leonard & Kerwin Expires November 7, 2015 [Page 5] Internet-Draft arcmedia May 2015 o Code Page - like charset but only applies to certain strings in the archive, when the archive format is ambiguous. Do not attempt to apply this parameter as one would apply charset to text/* o Endianness? o Time/Y2K representation issues? o Anything else? 6. Split Archives Several archive formats (notably RAR and ZIP) support split archives. A "split archive" can be stored in multiple files, or more generally, across multiple storage media. For example, the ZIP format supports two types of splits: "split archive" and "spanned archive". A "split archive" is a standard ZIP archive split over multiple files using file extensions .z01, .z02, etc.; the final file in the sequence uses the .zip file extension. The "spanned archive" was designed for use on floppy disks with restrictive space limitations; all archive files have the same filename, and volume labels (presumably on floppy disks) are used to store sequence information. Neither sub-format is merely a naive division of the octet stream: each ZIP file is parseable in its own right, and contains its own offset values. The TAR format (or family of formats, including cpio and ustar) was originally designed for streaming to and from tape devices, so splitting is accomplished differently. [[TODO: Consider how to label this content. archive/zip^01? archive/ zip; split=01? Something else? How shall 01 be associated with 02, 03, etc., when the Content-Disposition: ; filename="" parameter is "presentation-information" and may be separated from the Content-Type header information?]] 7. Fragment Identifier Syntax As archives usually store objects in hierarchical structures similar to filesystems, archives can serve as virtual filesystems. Respondents have noted that the objects stored in an archive can be addressed by a fragment syntax that resembles a filesystem path. At the same time, archives can store objects in different ways (along with different types of metadata), suggesting that a common baseline with flexible extension points is more appropriate than a fixed universal syntax. [[TODO: This will be explored in future drafts. Note the similarities with this and the file: URI...]] Leonard & Kerwin Expires November 7, 2015 [Page 6] Internet-Draft arcmedia May 2015 [[TODO: consider how to provide a fragment for content in the archive. NB: most archives do NOT provide Content-Type/media type information! So /foo.html being an HTML file is just an _assumption_, and possibly a very wrong one at that. There is no IETF registry for file extensions.]] 8. Piped-Composite Type Suffix Syntax [[TODO: discuss tar piped through bzip2, gzip, etc. as a distinct file format, rather than an application of the Content-Encoding: header. Suggest common suffix like archive/tar|bzip2, where | is some useful character but not + since + is for structured syntaxes.]] 9. Security Considerations Archives can store files, file metadata, and even entire filesystems; thus, security issues loom large because archives can contain just about anything. These concerns are magnified by the arbitrary transport of such data across the Internet. [[TODO: complete.]] 10. References 10.1. Normative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, January 2013. 10.2. Informative References [CP437] Microsoft Developer Network, "Code Page 437 MS-DOS Latin US", April 2015, . [ISO10646] International Organization for Standardization, "Information Technology - Universal Multiple-Octet Coded Character Set (UCS)", ISO/IEC 10646:2003, December 2003. Leonard & Kerwin Expires November 7, 2015 [Page 7] Internet-Draft arcmedia May 2015 [ZIP] Lindner, P., "application/zip registration at IANA", June 1993, . Appendix A. Expected Subtypes The following archive formats will be explored for registration as subtypes along with this effort: Archiving Only TAR Multipurpose (archiving, compression, encryption) ZIP, ACE, RAR, 7-Zip, StuffIt, FreeArc Software Packaging MSI, RPM, JAR, XPI, CAB, CRX, APK Disk Imaging ISO, NRG, BIN/CUE, VMDK, WIM, PartImage, IMG/IMA/IMZ, DMG Appendix B. Change Log Changes since -00 o retool to use XML2RFC - lots of layout changes o remove large sections of text, as suggested by Ned Freed and Dave Crocker o replace "primary" with "top-level", and "content-type" with "media type" throughout o add reference to RFC 6838 (BCP 13) - Media Type Specifications and Registration Procedures o lots of editorial changes Authors' Addresses Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA Email: dev+ietf@seantek.com URI: http://www.penango.com/ Leonard & Kerwin Expires November 7, 2015 [Page 8] Internet-Draft arcmedia May 2015 Matthew Kerwin Email: matthew@kerwin.net.au URI: http://matthew.kerwin.net.au/ Leonard & Kerwin Expires November 7, 2015 [Page 9]