Internet DRAFT - draft-thierry-bulk-barf

draft-thierry-bulk-barf






Network Working Group                                         P. Thierry
Internet-Draft                                      Thierry Technologies
Intended status: Experimental                             August 7, 2013
Expires: February 8, 2014


                          BULK ARchive Format
                       draft-thierry-bulk-barf-00

Abstract

   This specification describes a BULK format to pack together
   independent pieces of data and metadata about them.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 8, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.






Thierry                 Expires February 8, 2014                [Page 1]

Internet-Draft                  BULK-BARF                    August 2013


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.2.  Format overview . . . . . . . . . . . . . . . . . . . . . . 3
     1.3.  Conventions and Terminology . . . . . . . . . . . . . . . . 4
   2.  Guaranteed Backward Compatibility . . . . . . . . . . . . . . . 4
   3.  BULK archive namespace  . . . . . . . . . . . . . . . . . . . . 5
     3.1.  Packing . . . . . . . . . . . . . . . . . . . . . . . . . . 5
     3.2.  Stacking  . . . . . . . . . . . . . . . . . . . . . . . . . 5
     3.3.  Payload with metadata . . . . . . . . . . . . . . . . . . . 6
     3.4.  BULK stream embedding . . . . . . . . . . . . . . . . . . . 6
     3.5.  Compressed data . . . . . . . . . . . . . . . . . . . . . . 6
     3.6.  Encrypted data  . . . . . . . . . . . . . . . . . . . . . . 6
   4.  Security Considerations . . . . . . . . . . . . . . . . . . . . 7
   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 7
     5.1.  Normative References  . . . . . . . . . . . . . . . . . . . 7
     5.2.  Informative references  . . . . . . . . . . . . . . . . . . 7

































Thierry                 Expires February 8, 2014                [Page 2]

Internet-Draft                  BULK-BARF                    August 2013


1.  Introduction

1.1.  Rationale

   There are plenty of archives formats currently in use, from widely-
   used and repurposed formats like ZIP (used for generic file archives
   as well as Java deployment, ebooks and office documents) to legacy
   formats like ARC or Z through moderately used formats enjoying a
   stable niche, like tar, RAR or StuffIt.

   A few archive formats actually make reuse of existing ones.  Many
   archive formats developped nowadays actually reuse ZIP without
   modification and just dictate the tree structure inside the ZIP file.
   The Unix world has long had a tradition of separation of concern,
   thus using different formats for archiving (ar or tar) and
   compression (gzip, bzip2, lzma or now xz), with compressed archives
   named after the combination (foo.ar.gz, bar.tar.bz2, etc.).  Debian
   packages are actually ar files containing little uncompressed
   metadata and a couple of compressed tar files.

   But the problem remains that all these binary formats all define
   completely ad hoc syntaxes, sometimes incredibly optimized but
   narrowly tailored to their specific requirements.  Many leave little
   room for future extension, or in a contrived way (many formats are
   actually extended by abusing an unused metadata field and cramming a
   new ad hoc format in it).

   Some of these formats have a few fixed- or limited-length fields that
   became or will become obsolete in time.  The ar format, for example,
   suffers from the Year 2038 problem and cannot store long file names.
   Various implementations have used different incompatible extensions
   to store long file names.

   So we propose yet another archive format, that uses an efficient but
   extensible syntax, so that the format cannot fail to be extended or
   modified for new use cases or constraints.

1.2.  Format overview

   A BARF file is basically a set of metadata fields followed by data
   entries.  Each entry consists of a set of metadata fields followed by
   its content.  The interesting property of using BULK is that any
   portion of that structure is dynamic (no fixed metadata fields, and
   an entry without metadata is serialized as its content, as with BULK,
   the entry and its content cannot be confused with each other) and
   anything can be enclosed in a BULK structure to add features.

   Metadata fields are just a BULK expression, which means that any ad



Thierry                 Expires February 8, 2014                [Page 3]

Internet-Draft                  BULK-BARF                    August 2013


   hoc or standard BULK vocabulary can be used in an efficient way as
   metadata.  Mutually incompatible metadata vocabularies could even be
   stored alongside each other for legacy support, if need be.

   The archive file can be compressed or encrypted by an outside tool
   (producing a foo.barf.gz or bar.pgp file, for example), but so can
   any individual BULK expression.  The entire archive, internally to
   the file, can be a BULK compression or encryption form, as well as
   any metadata set, metadata field or entry.  Almost any extension and
   optimization can be retrofitted in this structure in a backward-
   compatible way, like checksums, digital signature or access offsets
   for random access.

   This extends the use case of BARF archives outside of archives for
   multiple files.  An extensible image format could be based on a BARF
   structure, allowing seamless transition from a simple format to a
   full-featured one, whereas existing formats usually add complex
   extensions that fail to be widely adopted (to add support for layers,
   transparency, different compression or metadata).  Although BARF
   would probably be ill-suited for playable audio and video, it would
   still provide a perfect fit for the storage of raw audio and video
   for editing programs.

1.3.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   Literal numerical values are provided in decimal or hexadecimal as
   appropriate.  Hexadecimal literals are prefixed with "0x" to
   distinguish them from decimal literals.

   BULK bytes sequences and expressions are described with the same
   conventions than used in the BULK 1.0 specification [BULK1]

2.  Guaranteed Backward Compatibility

   This specification defines the notion of Guaranteed Backward
   Compatibility (GBC).  It applies to forms that carry a main payload
   with additional metadata.  A form that obeys the rules of GBC has the
   type GBCForm.

   A GBCForm has the shape "( Ref {arguments} {next}:Expr )".  If the
   payload of a GBCForm is readable without knowledge of that form, then
   {next} MUST be that payload.  Otherwise, {next} MUST be nil.

   For example, a GBC-compliant checksum form could have the shape "(



Thierry                 Expires February 8, 2014                [Page 4]

Internet-Draft                  BULK-BARF                    August 2013


   crc32c {crc}:Word32 {payload} )", where {crc} is the checksum of the
   byte sequence {payload}.  On the other hand, a GBC-compliant
   encryption form, where obviously the payload is unreadable without
   proper knowledge of the form, could have the shape "( encrypt
   {payload} nil )".

3.  BULK archive namespace

   The archive namespace (mnemonic: "barf") is an official namespace
   identified by the UUID
   <urn:uuid:8beba7c6-c65d-5256-a2da-3763513953f3> (BULK, "Stack 'em.
   Pack 'em.  And rack 'em.").  It provides a standard way to pack one
   or more data elements together with metadata.

3.1.  Packing

   name  "0x1" (mnemonic: "pack" )

   shape  "( pack {metadata}:Expr {entries} )"

   This packs archive entries together as a form. {metadata} holds
   metadata about the whole pack.  In the context of {metadata},
   "rdf:this-resource" designates the whole pack.

3.2.  Stacking

   name  "0x2" (mnemonic: "stack" )

   shape  "( stack {metadata}:Expr {entries-metadata} ) {entries}"

   This stacks archive entries together as a sequence, for the cases
   where it is not appropriate for entries to belong to a single
   expression. {metadata} holds metadata about the whole stack.  In the
   context of {metadata}, "rdf:this-resource" designates the whole
   stack. {entries-metadata} MUST be a sequence of expressions of length
   equal or inferior to the number of expressions in {entries}.  Each
   expression in {entries-metadata} holds metadata about a single entry
   of the stack.  In the context of such a metadata expression,
   "rdf:this-resource" designates the described stack entry.  By
   default, the expression number N in {entries-metadata} describes the
   expression number N in {entries}.

   When the stack form is in the abstract yield, this has the property
   that if the last entry is an Array, the actual payload constitutes
   the end of the BULK stream.  This can make it possible for BULK-
   unaware programs to read and/or write that payload easily.

   Stacking also makes the addition of a metadata-carrying entry or a



Thierry                 Expires February 8, 2014                [Page 5]

Internet-Draft                  BULK-BARF                    August 2013


   metadata-less entry an append-only operation.

3.3.  Payload with metadata

   name  "0x3" (mnemonic: "describe" )

   shape  "( describe {metadata} {payload}:Expr )"

   This form associates arbitrary metadata with an arbitrary payload.
   It is intended to constitute most entries in BARF archives.  In the
   context of {metadata}, "rdf:this-resource" designates the payload.

   Type: "GBCForm"

3.4.  BULK stream embedding

   name  "0x4" (mnemonic: "bulk-stream" )

   shape  "( bulk-streamm {payload} )"

   This form makes it possible to include a complete BULK stream without
   modification, as {payload}.

   Type: "GBCForm"

3.5.  Compressed data

   name  "0x5" (mnemonic: "compressed" )

   shape  "( compressed {method}:Expr {payload}:Array nil )"

   This form encapsulates a compressed payload.  This specification
   doesn't define names to express a compression method.

   Type: "GBCForm"

3.6.  Encrypted data

   name  "0x6" (mnemonic: "encrypted" )

   shape  "( encrypted {method}:Expr {payload}:Array nil )"

   This form encapsulates an encrypted payload.  This specification
   doesn't define names to express an encryption method.

   Type: "GBCForm"





Thierry                 Expires February 8, 2014                [Page 6]

Internet-Draft                  BULK-BARF                    August 2013


4.  Security Considerations

5.  References

5.1.  Normative References

   [BULK1]    Thierry, P., "Binary Uniform Language Kit 1.0",
              draft-thierry-bulk-02 (work in progress), August 2013.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

5.2.  Informative references

   [ISO8601]  "ISO 8601:2004 Data elements and interchange formats --
              Information interchange -- Representation of dates and
              times", 2004.

Author's Address

   Pierre Thierry
   Thierry Technologies

   EMail: pierre@nothos.net



























Thierry                 Expires February 8, 2014                [Page 7]