Internet DRAFT - draft-bollow-sxdf

draft-bollow-sxdf



                                                          Norbert Bollow
Internet-Draft                                          TenthNet Project
Expires: May 22, 2006                                  November 19, 2005

                 SXDF - Simple Extensible Data Format
                      <draft-bollow-sxdf-01.txt>

Status of this Memo


   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 22, 2006.


Copyright Notice

   Copyright (C) The Internet Society (2005).


Abstract

   The Simple Extensible Data Format (SXDF) defined in this document
   aims to combine the nice properties of XML (of providing a universal,
   text-based data format which allows adding additional data fields
   without breaking existing application programs) with a simple syntax
   which can be parsed efficiently by computer programs.  This data
   format is intended for over-the-wire use in webservice protocols,
   where there is generally no interest in being able to directly modify
   the representation of the data with a standard text editor.





1.  Introduction

1.1. Overview

   Over the past few years, the Extensible Markup Language (XML)
   [W3C.REC-xml] has become a widely used method for data markup.
   The Simple Extensible Data Format (SXDF) defined in this document
   aims to combine the nice properties of XML with a simple syntax
   which can be parsed efficiently by computer programs.

   SXDF shares the following good properties of XML:

   o  It is a universal data format which can be used for expressing
      arbitrarily complex data.

   o  It is a text-based format, which makes it more convenient to
      debug protocol interactions which use the data format.

   o  Data can be validated in an automated manner to ensure that it
      adheres to a specified data structure.

   o  There is great flexibility in how the data format used by a given
      protocol can be extended without breaking existing
      implementations of the protocol.

   SXDF differs from XML in that with SXDF the main design goals are
   simplicity, and allowing efficient parsing by computer programs.

   SXDF is not a "markup language".  It is not intended for data which
   will be edited with a text editor.  

   A sequence of bytes (eight-bit octets) which satisfies the
   requirements of this specification is called a "SXDF resource". Here
   is an example:

   483:# Here is some data in SXDF format
   1%
    8:Booklist=3@
     5%
      5:Title=16:Hardware Hacking
      6:Author=19:Kevin Mitnick (Ed.)
      4:Year=4:2004
      4:ISBN=13:1-932-26683-6
      9:Publisher=8:Syngress
     5%
      5:Title=12:We the Media
      6:Author=11:Dan Gillmor
      4:Year=4:2004
      4:ISBN=13:0-596-00733-7
      9:Publisher=8:O'Reilly
     5%
      5:Title=22:Matrix Decision Making
      6:Author=21:Alex Lowy & Phil Hood
      4:Year=4:2004
      4:ISBN=13:0-787-97292-4
      9:Publisher=11:Jossey-Bass
   ;

   Here is the same data expressed in XML format:

   <!-- Here is the same data expressed in XML markup -->
   <Booklist>
     <Book>
        <Title>Hardware Hacking</title>
        <Author>Kevin Mitnick (Ed.)</author>
        <Year>2004</year>
        <ISBN>1-932-26683-6</ISBN>
        <Publisher>Syngress</publisher>
      </Book>
      <Book>
        <Title>We the Media</title>
        <Author>Dan Gillmor</author>
        <Year>2004</year>
        <ISBN>0-596-00733-7</ISBN>
        <Publisher>O'Reilly</publisher>
      </Book>
      <Book>
         <title>Matrix Decision Making</title>
         <author>Alex Lowy &amp; Phil Hood</author>
         <year>2004</year>
         <ISBN>0-787-97292-4</ISBN>
         <publisher>Jossey-Bass</publisher>
      </Book>
   </Booklist>

   Parsers for the SXDF format are generally less complicated and faster
   than parsers for the XML format.

   In addition, SXDF provides a general method for including digital
   signatures.


1.2. Pronunciation

   The acronym "SXDF" is pronounced like "sixdaf".


1.3. Notational conventions

1.3.1. Requirements notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [KEYWORDS].

1.3.2. Syntactic notation

   This syntax specification of SXDF in section 2, and the specification
   of SXDF Document Structure Descriptions in section 3 uses the
   Augmented Backus-Naur Form (ABNF) notation specified in [RFC2234].



2.  Syntax specification
   
   With "SXDF resource" we mean any sequence of bytes which matches
   the production labeled "resource", below.   With "byte" we mean an
   octet of bits.

      resource   = nonnegint ":" *comment dictionary ";"

      nonnegint  = 1*digit

      digit      = %x30-39

   The non-negative integer at the beginning of the resource MUST be
   equal to the number of bytes between the ":" which follows it and
   the ";" which ends the string.  In this way, each SXDF resource
   is a "netstring" as described in [Netstrings].

   The SXDF resource MAY contain comments; if it does, the comments
   MUST follow immediately after the initial colon.

      comment    = "#" *( %x00-09 / %x0B-FF ) %x0A

   The fundamental SXDF data container is the "dictionary".  It
   contains "key = value" pairs which are called the elements of the
   dictionary.

      dictionary = nonnegint "%" line-end *( string "=" value line-end )

   The initial non-negative integer of a dictionary MUST be equal to
   the number of ( string "=" value line-end ) lines which the
   dictionary contains.

   In the dictionary, the strings which precede the "=" in each
   ( string "=" value line-end ) line are known as "keys".  Any given
   key MUST NOT occur more than once in the same dictionary.

      line-end   = %x0A *" "

   In the line-end production, there MAY be any number of space
   characters following the newline character.  It is RECOMMENDED to
   use a number of space characters which is equal to the number of
   containing dictionary and sequence elements, as this improves
   readability for humans.

      string     = nonnegint ":" *%x00-FF %x0A

   The initial non-negative integer of a string MUST be equal to
   the number of bytes between the ":" and the final newline character.

   If the string contains textual data, it SHOULD be either in UTF-8
   encoding or a sequence of 16-bit unicode characters that starts with
   the unicode Byte Order Mark, 0xFE 0xFF or 0xFF 0xFE, depending on
   endianness.  (The case of UTF-8 encoding can be easily and reliably
   distinguished from this, because in UTF-8, all bytes have values is
   the range 0x00-0xFD.)

      value      = ( string  / dictionary / sequence / 
                     isequence / fsequence )

      sequence   = nonnegint "@" %x0A *( value line-end )

      isequence  = nonnegint "i" %x0A *( int line-end )

      fsequence  = nonnegint "f" %x0A *( float line-end )

   In the sequence, isequence or fsequence production, the initial
   non-negative integer MUST be equal to the number of following
   ( value line-end ) lines.  The isequence and fsequence productions
   are like a sequence with the difference that the values are
   restricted to integer or floating-point numeric constants.

      int        = "0" / ( ["-"] %x31-39 *digit )
 
      float      = "0" / ( ["-"] ( "0" / %x31-39 *digit ) "." 1*digit
                           [ "e" int ] )



3.  Use of SXDF for storing persistent data

   Besides its use as an over-the-wire format for webservice protocols,
   the SXDF data format MAY also be used as an on-disk format for
   storing persistent data.  However, programs which use SXDF for this
   purpose SHOULD also support the EXDF data format [EXDF] which is
   designed for allowing data resources to be edited conveniently in a
   text editor.



4.  Reserved Keywords

   In the following two sections, special meanings will be defined for
   the keywords "_DSD", "_DATA" and "_SIGNATURES".  These words MUST NOT
   be used as dictionary keys except as indicated below.  The use of
   any other dictionary keys which consists entirely of a "_" character
   followed by three of more uppercase letters SHOULD be avoided.  There
   are two reasons for this:  One is that there could be a need for
   future revisions and extensions of this specification to introduce
   additional reserved words.  The second is that upon conversion of
   SXDF resources to the editable EXDF data format, the reserved
   keywords MAY be replaced by equivalent words in the user's preferred
   language.
   


5.  SXDF Data Structure Descriptions (DSD)

   As mentioned in the introduction, SXDF data can be verified to adhere
   to a specific data structure, similar to how this is possible with
   XML.

   A SXDF resource MAY contain a "_DSD" element, which, if present, is
   an assertion that the SXDF resource confirms to a particular SXDF
   Data Structure Description (DSD).  The value of the _DSD element
   SHOULD be either a string which references the DSD by means of an
   URL, or a dictionary which contains the DSD explicitly.

   A DSD is a list of two string elements, the first of which is a URI
   and the second is the actual Data Structure Description in some
   precisely-specified, human-readable description language.  The
   description language is identified by the URI in the first string.

   Similar to the design criteria for programming languages, for such
   description languages human readability is more important than
   conciseness:  It is a serious problem when DSDs are difficult to
   read for the (very human) developers of computer programs which read
   or write the corresponding SXDF-based data formats.  When DSDs are
   difficult to read, this work becomes needlessly unpleasant,
   time-consuming and error-prone.



6.  Digital Signatures

   The SXDF format allows to add any number of digital signatures to
   an entire SXDF resource or to any dictionary values in the resource.

   Signing an entire SXDF resource is done by creating a new resource
   with two elements named "_DATA" and "_SIGNATURES".  The _DATA element
   contains the contents of the original SXDF resource and the
   _SIGNATURES element a list of digital signatures (see below).

   Signing individual element values is done by replacing
   the value with a dictionary that has precisely two elements with
   the keys "_DATA" and "_SIGNATURES".

   The set of DSDs to which the resource conforms is unchanged by this
   operation, because from the perspective of data semantics and from
   the perspective of DSDs the _SIGNATURES element is ignored and the
   value of the _DATA element is treated as if it were in the place of
   this two-element dictionary.  DSDs will never reference any elements
   named "_DATA" or "_SIGNATURES".

   The _SIGNATURES element contains a list of one or more strings, with
   each of the strings containing a digital signature in OpenPGP format,
   as specified in [RFC 2440], of the value of the DATA element,
   excluding the initial "4:DATA=" but including the final "\n".

   These signatures are binary data for which the SXDF format has no
   need of special encoding or "ASCII armoring".

   Except if the _DATA element contains a string value, the signature
   MUST be of type 0x00 ("signature of a binary document", see section
   5.2.1 of [RFC2440]), and before signing, the contents of the _DATA
   element MUST be canonicalized by removing any and all space
   characters which follow a newline character but which are not part of
   a string value.  If the _DATA element contains a string value, other
   signature types are also possible.

   Here is an example of the canonicalization process:  Suppose that
   within a SXDF resource, a dictionary contains the following
   _DATA dictionary:

   5:_DATA=3%
    6:Action=28:qrpc://example.com/add_books
    10:ResourceID=28:34DH734HF64HF734@example.org
    8:Booklist=3@
     5%
      5:Title=16:Hardware Hacking
      6:Author=19:Kevin Mitnick (Ed.)
      4:Year=4:2004
      4:ISBN=13:1-932-26683-6
      9:Publisher=8:Syngress
     5%
      5:Title=12:We the Media
      6:Author=11:Dan Gillmor
      4:Year=4:2004
      4:ISBN=13:0-596-00733-7
      9:Publisher=8:O'Reilly
     5%
      5:Title=22:Matrix Decision Making
      6:Author=21:Alex Lowy & Phil Hood
      4:Year=4:2004
      4:ISBN=13:0-787-97292-4
      9:Publisher=11:Jossey-Bass

   Then this is the canonicalized version which will be signed:

   3%
   6:Action=28:qrpc://example.com/add_books
   10:ResourceID=28:34DH734HF64HF734@example.org
   8:Booklist=3@
   5%
   5:Title=16:Hardware Hacking
   6:Author=19:Kevin Mitnick (Ed.)
   4:Year=4:2004
   4:ISBN=13:1-932-26683-6
   9:Publisher=8:Syngress
   5%
   5:Title=12:We the Media
   6:Author=11:Dan Gillmor
   4:Year=4:2004
   4:ISBN=13:0-596-00733-7
   9:Publisher=8:O'Reilly
   5%
   5:Title=22:Matrix Decision Making
   6:Author=21:Alex Lowy & Phil Hood
   4:Year=4:2004
   4:ISBN=13:0-787-97292-4
   9:Publisher=11:Jossey-Bass


6.  Security Considerations

   Webservices typically act on untrusted data; SXDF implementations
   therefore need to be carefully designed and reviewed to prevent
   security breaches caused by improper handling of malformed SXDF
   resources.

   The universal data format described in this specification
   incorporates a mechanism through which digital signatures can be
   provided for subsets of the data.  The security which may be
   added through this mechanism depends on the strength of the
   corresponding mechanisms for generating and verifying the
   signatures and for establishing trust for the public keys which
   correspond to the digital signatures.


7.  IANA Considerations

   This document has no actions for IANA.



References

   Normative References

   [KEYWORDS]         Bradner, S., "Key words for use in RFCs to
                      Indicate Requirement Levels", BCP 14, RFC 2119.

   [RFC2234]          Crocker, D., Ed., "Augmented BNF for Syntax
                      Specifications: ABNF", RFC 2234.

   [RFC2440]          Callas, J., Donnerhacke, L., Finney, H.,
                      Thayer, R. "OpenPGP Message Format", RFC 2440

   Informative References

   [Netstrings]       Bernstein, D. J., "Netstrings"
                      <http://cr.yp.to/proto/netstrings.txt>

   [EXDF]             Bollow, N., "EXDF - Editable Extensible Data
                      Format", work in progress.

   [QQP]              Bollow, N., "QQP - Quick Queues Protocol",
                      work in progress.
                      <http://QQP.org/>

   [QRPC]             Bollow, N., "QRPC - Queueable Remote Procedure
                      Calls", work in progress.
                      <http://QRPC.org/>

   [W3C.REC-xml]      Bray, T., Paoli, J., Sperberg-McQueen, C. and
                      E. Maler, "Extensible Markup Language (XML) 1.0
                      (2nd ed)", W3C REC-xml, October 2000,
                      <http://www.w3.org/TR/REC-xml>.

Authors' Address

   Norbert Bollow
   Weidlistrasse 18
   CH-8624 Gruet

   Phone: +41 1 972 2059
   EMail: nb@bollow.ch


Full Copyright Statement

   Copyright (C) The Internet Society (2005).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all his rights.

   This document and the information contained herein are provided on
   an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
   THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
   ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
   PARTICULAR PURPOSE.



Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.