Network Working Group S. Leonard Internet-Draft Penango, Inc. Intended Status: Standards Track M. Kerwin Expires: April 30, 2015 October 27, 2014 The Archive Primary Media Type for File Archives draft-seantek-kerwin-arcmedia-type-00 Abstract This document defines a new primary content-type to be known as "archive", which defines a fundamental type of content with unique presentational, hardware, and processing aspects. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 30, 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Leonard & Kerwin Expires April 30, 2015 [Page 1] Internet-Draft The archive Media Type for File Archives October 2014 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 2 2. Definition of an archive . . . . . . . . . . . . . . . . . . . 2 3. Consultation Mechanisms . . . . . . . . . . . . . . . . . . . 5 4. Encoding and Transport . . . . . . . . . . . . . . . . . . . . 5 5. Common Required and Optional Parameters . . . . . . . . . . . 7 6. Split Archives . . . . . . . . . . . . . . . . . . . . . . . . 7 7. Fragment Identifier Syntax . . . . . . . . . . . . . . . . . . 8 8. Piped-Composite Type Suffix Syntax . . . . . . . . . . . . . . 8 9. Security Considerations . . . . . . . . . . . . . . . . . . . 8 10. Normative References . . . . . . . . . . . . . . . . . . . . . 8 Appendix A. Expected Subtypes . . . . . . . . . . . . . . . . . . 9 1. Introduction The purpose of this memo is to propose an update to [RFC2045] to include a new primary content-type to be known as "archive". [RFC2045] describes mechanisms for specifying and describing the format of Internet Message Bodies via content-type/subtype pairs. "archive" defines a fundamental type of content with unique presentational, hardware, and processing aspects. Various subtypes of this primary type are immediately anticipated, and will be covered under separate documents. 1.1. Overview This document will outline what an archive is, show examples of archives, and discuss the benefits of grouping archives together. This document is a discussion document for an agreed definition, intended eventually to form a standard accepted extension to [RFC2045]. 1.2. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Definition of an archive An archive primary media type identifies data that represents one or more files [FILE] along with metadata. Archives are used to collect Leonard & Kerwin Expires April 30, 2015 [Page 2] Internet-Draft The archive Media Type for File Archives October 2014 multiple data files together into a single file for easier portability and storage. Archive formats can provide many optional services, including: 1. compression 2. encryption 3. authentication 4. backup 5. filesystem imaging 6. software packaging and distribution 7. volume-splitting (archive split into multiple contents) 8. block storage Formats and techniques that perform one or more of these services already exist under separate registrations. For example, the Content- Encoding header can be used to compress Internet message content. The distinguishing feature of the archive primary type is that these services are integrated into the format itself, along with the inclusion of file-specific metadata. Virtually all formats contemplated under this primary type are designed to concatenate multiple files into a single data stream, along with filenames and other metadata. When an Internet-facing application handles content labeled with this type, it SHOULD provide handling consistent with the archive as a discrete data item. For example, an Internet mail user agent would display an archive-labeled type with an archive icon, possibly with a preview of the files contained therein (as opposed to automatically traversing its contents, as it would for multipart-labeled content). Common operations include creating an archive, identifying files in an archive, adding to an archive, backing up to an archive, extracting an archive, restoring from an archive, deleting from an archive, mounting and unmounting an archive, [[TODO: executing an archive?]], and installing and uninstalling an archive. * Creating: taking files from a filesystem and representing those files in an archive. * Identifying files: parsing an archive's format, extracting information about files represented in the archive. * Adding: parsing an archive's format, adding files or non-file data to the archive. In virtually all cases, at least some part of the archive's content will be modified (though perhaps only at the end). Unlike, for instance, text media types, concatenating two separate archive contents *never* yields a valid composite archive. Leonard & Kerwin Expires April 30, 2015 [Page 3] Internet-Draft The archive Media Type for File Archives October 2014 * Backing up: taking some or all of a filesystem and representing the filesystem in an archive, with the express intention of recording the files as they exist in a source filesystem at the time of backing up. For example, the compression, encryption, and access control list (permissions) properties of the files would be preserved. * Extracting: parsing an archive's format, copying file data (or file metadata) out of the archive into one or more files on a destination filesystem. This operation implies that at least some file metadata will be preserved, while other file metadata may be adjusted or added to adapt to the local environment. * Restoring: parsing an archive's format, copying file data out of the archive into the destination filesystem, with the express intention of recreating the files as they existed in a source filesystem at the time of backing up. For example, the compression, encryption, and access control list (permissions) properties of the files would be preserved. * Deleting: parsing an archive's format, removing file data (or metadata) from the archive, requiring changes to the archive's contents. Some archive formats permit orphan data in the archive content; other formats require re-serializing some or all of the archive. * Mounting and unmounting: Mapping an archive's semantics directly to a filesystem, so that the files represented in the archive can be accessed using the filesystem's namespace with typical filesystem APIs. Rather than being backed by a physical block storage device, that part of the filesystem is backed by the archive. * Executing [[NB: this may be controversial; it is worth discussing]]: Identifying executable semantics of an archive, and causing code to execute. * Installing and uninstalling [[NB: this may be controversial; it is worth discussing]]: Treating the archive as a software package, extracting certain contents in the archive and executing other contents in the archive, according to some software packaging protocol. Leonard & Kerwin Expires April 30, 2015 [Page 4] Internet-Draft The archive Media Type for File Archives October 2014 3. Consultation Mechanisms Before proposing a subtype for the archive/* primary type, it is suggested that the subtype author examine the definition (above) of what an archive/* is and the listing (below) of what an archive/* is not. Additional consultations with the authors of the existing archive/* subtypes is also suggested. 4. Encoding and Transport Unrecognized subtypes of archive SHOULD at a minimum be treated as "archive/file". Like "application/octet-stream", the purpose of the "archive/file" is to provide default handling; it does not represent a particular archive format. Implementations SHOULD pass subtypes of archive that they do not specifically recognize to a robust general-purpose archive viewing application, if such an application is available. If default archive (archive/file) handling is not supported, it is appropriate to treat the archive like "application/octet-stream". Unless noted in the subtype registration, subtypes of archive SHALL be assumed to contain binary data, implying a content encoding of base64 for email and binary transfer for ftp and http. The formal syntax for the subtypes of the model primary type SHOULD look like this: Type name: archive Subtype name: xxxxxxxx Required parameters: none Optional parameters: TBD Encoding considerations: base64 encoding is recommended when transmitting archive/* documents through MIME electronic mail. Leonard & Kerwin Expires April 30, 2015 [Page 5] Internet-Draft The archive Media Type for File Archives October 2014 Security considerations: see Section 5 below Interoperability considerations: TBD Published specification: TBD Applications that use this media type: TBD Fragment identifier considerations: The considerations of this document, plus any extra syntaxes not inconsistent with this document. Additional information: Deprecated alias names for this type: (Include non-archive alias names, such as those in application.) Magic number(s): TBD File extension(s): TBD Macintosh file type code(s): TBD See Appendix A for references to some of the expected subtypes. Person and email address to contact for further information: TBD Intended usage: TBD (COMMON will be the most common) Restrictions on usage: TBD Author: TBD Change controller: TBD Provisional registration? (standards tree only): (Yes/No) (Any other information that the author deems interesting may be added below this line.) Leonard & Kerwin Expires April 30, 2015 [Page 6] Internet-Draft The archive Media Type for File Archives October 2014 The optional parameters consist of starting conditions and variable values used as part of the subtypes. 5. Common Required and Optional Parameters Unlike the text primary media type (for instance), virtually all archive formats have been designed with almost all of the information required for interpretation contained within the format. Therefore, parameters are NOT RECOMMENDED; registrants are not expected to register additional parameters. Regrettably, not all archive formats are as "universal" or "complete" as one might assume at first glance. This is because some archive formats are very old or are based on older formats where backwards- compatibility was a design goal; thus they were not designed with transport across the Internet in mind. The ZIP file is an example: although the modern ZIP supports Unicode [CITE], the default encoding of ZIP filenames has always been Code Page 437. Since "archive" contents are literally archives of computing history, sometimes communicating the archive as-is, rather than updating the archive to a more universal format, is necessary. Implementations that are archive-type aware MUST support the following parameters for maximum compatibility. At the same time, new archives SHOULD NOT rely on these parameters for disambiguation; new archives SHOULD be created in such a way that "universal" interoperability is achieved with the archive's self-contained information. [[TODO: code page--it's like charset but only applies to certain strings in the archive, when the archive format is ambiguous; do NOT attempt to apply this parameter as one would apply charset to text/*. Endian-ness? Time/Y2K representation issues? Anything else?]] 6. Split Archives Several archive formats (notably RAR and ZIP) support split archives. A "split archive" is an archive that is stored in multiple files (when stored as multiple files), or more generally, across multiple storage media. The ZIP format, for example, actually has two types of splits: "split archive" and "spanned archive". A "split archive" is a standard ZIP archive split over multiple files with the file extensions .z01, .z02, etc.; the .zip file is the last file. A "spanned archive" is the original format designed for use with swapping floppy disks. All archive files have the same filename; the format uses volume labels (presumably on floppy disks) to store disk numbers. Neither sub- format is merely a naive division of the octet stream: each ZIP file is parseable in its own right, and contains its own offset values. Leonard & Kerwin Expires April 30, 2015 [Page 7] Internet-Draft The archive Media Type for File Archives October 2014 The TAR format (or family of formats, including cpio and ustar) was originally designed for streaming to and from tape devices, so splitting is accomplished differently. [[TODO: Consider how to label this content. archive/zip^01? archive/zip; split=01? Something else? How shall 01 be associated with 02, 03, etc., when the Content-Disposition: ; filename="" parameter is "presentation-information" and may be separated from the Content-Type header information?]] 7. Fragment Identifier Syntax Because all archives represent files, archives can serve as virtual filesystems. Respondents have noted that an archive's files can be addressed by a fragment syntax that resembles a filesystem path. At the same time, archives may record files in different ways (along with different types of metadata), suggesting that a common baseline with flexible extension points is more appropriate than a fixed universal syntax. [[TODO: This will be explored in future drafts. Note the similarities with this and the file: URI...]] [[TODO: consider how to provide a fragment for content in the archive. NB: most archives do NOT provide Content-Type/media type information! So /foo.html being an HTML file is just an *assumption*, and possibly a very wrong one at that. There is no IETF registry for file extensions.]] 8. Piped-Composite Type Suffix Syntax [[TODO: discuss tar piped through bzip2, gzip, etc. as a distinct file format, rather than an application of the Content-Encoding: header. Suggest common suffix like archive/tar|bzip2, where | is some useful character but not + since + is for structured syntaxes.]] 9. Security Considerations Archives represent files, file metadata, and filesystems; thus, security issues loom large because archives can contain just about anything. These concerns are magnified by the arbitrary transport of such data across the Internet. [[TODO: complete.]] 10. Normative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Leonard & Kerwin Expires April 30, 2015 [Page 8] Internet-Draft The archive Media Type for File Archives October 2014 Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, January 2013. Appendix A. Expected Subtypes The following archive formats will be explored for registration as subtypes along with this effort: Archiving Only TAR Multipurpose (archiving, compression, encryption) ZIP, ACE, RAR, 7-Zip, StuffIt, FreeArc Software Packaging MSI, RPM, JAR, XPI, CAB, CRX, APK Disk Imaging ISO, NRG, BIN/CUE, VMDK, WIM, PartImage, IMG/IMA/IMZ, DMG Authors' Addresses Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA EMail: dev+ietf@seantek.com URI: http://www.penango.com/ Matthew Kerwin Email: matthew@kerwin.net.au URI: http://matthew.kerwin.net.au/ Leonard & Kerwin Expires April 30, 2015 [Page 9]