Network Working Group P. Thierry Internet-Draft Thierry Technologies Intended status: Experimental August 7, 2013 Expires: February 8, 2014 BULK ARchive Format draft-thierry-bulk-barf-00 Abstract This specification describes a BULK format to pack together independent pieces of data and metadata about them. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on February 8, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Thierry Expires February 8, 2014 [Page 1] Internet-Draft BULK-BARF August 2013 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Format overview . . . . . . . . . . . . . . . . . . . . . . 3 1.3. Conventions and Terminology . . . . . . . . . . . . . . . . 4 2. Guaranteed Backward Compatibility . . . . . . . . . . . . . . . 4 3. BULK archive namespace . . . . . . . . . . . . . . . . . . . . 5 3.1. Packing . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Stacking . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3. Payload with metadata . . . . . . . . . . . . . . . . . . . 6 3.4. BULK stream embedding . . . . . . . . . . . . . . . . . . . 6 3.5. Compressed data . . . . . . . . . . . . . . . . . . . . . . 6 3.6. Encrypted data . . . . . . . . . . . . . . . . . . . . . . 6 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.1. Normative References . . . . . . . . . . . . . . . . . . . 7 5.2. Informative references . . . . . . . . . . . . . . . . . . 7 Thierry Expires February 8, 2014 [Page 2] Internet-Draft BULK-BARF August 2013 1. Introduction 1.1. Rationale There are plenty of archives formats currently in use, from widely- used and repurposed formats like ZIP (used for generic file archives as well as Java deployment, ebooks and office documents) to legacy formats like ARC or Z through moderately used formats enjoying a stable niche, like tar, RAR or StuffIt. A few archive formats actually make reuse of existing ones. Many archive formats developped nowadays actually reuse ZIP without modification and just dictate the tree structure inside the ZIP file. The Unix world has long had a tradition of separation of concern, thus using different formats for archiving (ar or tar) and compression (gzip, bzip2, lzma or now xz), with compressed archives named after the combination (foo.ar.gz, bar.tar.bz2, etc.). Debian packages are actually ar files containing little uncompressed metadata and a couple of compressed tar files. But the problem remains that all these binary formats all define completely ad hoc syntaxes, sometimes incredibly optimized but narrowly tailored to their specific requirements. Many leave little room for future extension, or in a contrived way (many formats are actually extended by abusing an unused metadata field and cramming a new ad hoc format in it). Some of these formats have a few fixed- or limited-length fields that became or will become obsolete in time. The ar format, for example, suffers from the Year 2038 problem and cannot store long file names. Various implementations have used different incompatible extensions to store long file names. So we propose yet another archive format, that uses an efficient but extensible syntax, so that the format cannot fail to be extended or modified for new use cases or constraints. 1.2. Format overview A BARF file is basically a set of metadata fields followed by data entries. Each entry consists of a set of metadata fields followed by its content. The interesting property of using BULK is that any portion of that structure is dynamic (no fixed metadata fields, and an entry without metadata is serialized as its content, as with BULK, the entry and its content cannot be confused with each other) and anything can be enclosed in a BULK structure to add features. Metadata fields are just a BULK expression, which means that any ad Thierry Expires February 8, 2014 [Page 3] Internet-Draft BULK-BARF August 2013 hoc or standard BULK vocabulary can be used in an efficient way as metadata. Mutually incompatible metadata vocabularies could even be stored alongside each other for legacy support, if need be. The archive file can be compressed or encrypted by an outside tool (producing a foo.barf.gz or bar.pgp file, for example), but so can any individual BULK expression. The entire archive, internally to the file, can be a BULK compression or encryption form, as well as any metadata set, metadata field or entry. Almost any extension and optimization can be retrofitted in this structure in a backward- compatible way, like checksums, digital signature or access offsets for random access. This extends the use case of BARF archives outside of archives for multiple files. An extensible image format could be based on a BARF structure, allowing seamless transition from a simple format to a full-featured one, whereas existing formats usually add complex extensions that fail to be widely adopted (to add support for layers, transparency, different compression or metadata). Although BARF would probably be ill-suited for playable audio and video, it would still provide a perfect fit for the storage of raw audio and video for editing programs. 1.3. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Literal numerical values are provided in decimal or hexadecimal as appropriate. Hexadecimal literals are prefixed with "0x" to distinguish them from decimal literals. BULK bytes sequences and expressions are described with the same conventions than used in the BULK 1.0 specification [BULK1] 2. Guaranteed Backward Compatibility This specification defines the notion of Guaranteed Backward Compatibility (GBC). It applies to forms that carry a main payload with additional metadata. A form that obeys the rules of GBC has the type GBCForm. A GBCForm has the shape "( Ref {arguments} {next}:Expr )". If the payload of a GBCForm is readable without knowledge of that form, then {next} MUST be that payload. Otherwise, {next} MUST be nil. For example, a GBC-compliant checksum form could have the shape "( Thierry Expires February 8, 2014 [Page 4] Internet-Draft BULK-BARF August 2013 crc32c {crc}:Word32 {payload} )", where {crc} is the checksum of the byte sequence {payload}. On the other hand, a GBC-compliant encryption form, where obviously the payload is unreadable without proper knowledge of the form, could have the shape "( encrypt {payload} nil )". 3. BULK archive namespace The archive namespace (mnemonic: "barf") is an official namespace identified by the UUID (BULK, "Stack 'em. Pack 'em. And rack 'em."). It provides a standard way to pack one or more data elements together with metadata. 3.1. Packing name "0x1" (mnemonic: "pack" ) shape "( pack {metadata}:Expr {entries} )" This packs archive entries together as a form. {metadata} holds metadata about the whole pack. In the context of {metadata}, "rdf:this-resource" designates the whole pack. 3.2. Stacking name "0x2" (mnemonic: "stack" ) shape "( stack {metadata}:Expr {entries-metadata} ) {entries}" This stacks archive entries together as a sequence, for the cases where it is not appropriate for entries to belong to a single expression. {metadata} holds metadata about the whole stack. In the context of {metadata}, "rdf:this-resource" designates the whole stack. {entries-metadata} MUST be a sequence of expressions of length equal or inferior to the number of expressions in {entries}. Each expression in {entries-metadata} holds metadata about a single entry of the stack. In the context of such a metadata expression, "rdf:this-resource" designates the described stack entry. By default, the expression number N in {entries-metadata} describes the expression number N in {entries}. When the stack form is in the abstract yield, this has the property that if the last entry is an Array, the actual payload constitutes the end of the BULK stream. This can make it possible for BULK- unaware programs to read and/or write that payload easily. Stacking also makes the addition of a metadata-carrying entry or a Thierry Expires February 8, 2014 [Page 5] Internet-Draft BULK-BARF August 2013 metadata-less entry an append-only operation. 3.3. Payload with metadata name "0x3" (mnemonic: "describe" ) shape "( describe {metadata} {payload}:Expr )" This form associates arbitrary metadata with an arbitrary payload. It is intended to constitute most entries in BARF archives. In the context of {metadata}, "rdf:this-resource" designates the payload. Type: "GBCForm" 3.4. BULK stream embedding name "0x4" (mnemonic: "bulk-stream" ) shape "( bulk-streamm {payload} )" This form makes it possible to include a complete BULK stream without modification, as {payload}. Type: "GBCForm" 3.5. Compressed data name "0x5" (mnemonic: "compressed" ) shape "( compressed {method}:Expr {payload}:Array nil )" This form encapsulates a compressed payload. This specification doesn't define names to express a compression method. Type: "GBCForm" 3.6. Encrypted data name "0x6" (mnemonic: "encrypted" ) shape "( encrypted {method}:Expr {payload}:Array nil )" This form encapsulates an encrypted payload. This specification doesn't define names to express an encryption method. Type: "GBCForm" Thierry Expires February 8, 2014 [Page 6] Internet-Draft BULK-BARF August 2013 4. Security Considerations 5. References 5.1. Normative References [BULK1] Thierry, P., "Binary Uniform Language Kit 1.0", draft-thierry-bulk-02 (work in progress), August 2013. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 5.2. Informative references [ISO8601] "ISO 8601:2004 Data elements and interchange formats -- Information interchange -- Representation of dates and times", 2004. Author's Address Pierre Thierry Thierry Technologies EMail: pierre@nothos.net Thierry Expires February 8, 2014 [Page 7]