Internet DRAFT - draft-bollow-sxdf
draft-bollow-sxdf
Norbert Bollow
Internet-Draft TenthNet Project
Expires: May 22, 2006 November 19, 2005
SXDF - Simple Extensible Data Format
<draft-bollow-sxdf-01.txt>
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 22, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
The Simple Extensible Data Format (SXDF) defined in this document
aims to combine the nice properties of XML (of providing a universal,
text-based data format which allows adding additional data fields
without breaking existing application programs) with a simple syntax
which can be parsed efficiently by computer programs. This data
format is intended for over-the-wire use in webservice protocols,
where there is generally no interest in being able to directly modify
the representation of the data with a standard text editor.
1. Introduction
1.1. Overview
Over the past few years, the Extensible Markup Language (XML)
[W3C.REC-xml] has become a widely used method for data markup.
The Simple Extensible Data Format (SXDF) defined in this document
aims to combine the nice properties of XML with a simple syntax
which can be parsed efficiently by computer programs.
SXDF shares the following good properties of XML:
o It is a universal data format which can be used for expressing
arbitrarily complex data.
o It is a text-based format, which makes it more convenient to
debug protocol interactions which use the data format.
o Data can be validated in an automated manner to ensure that it
adheres to a specified data structure.
o There is great flexibility in how the data format used by a given
protocol can be extended without breaking existing
implementations of the protocol.
SXDF differs from XML in that with SXDF the main design goals are
simplicity, and allowing efficient parsing by computer programs.
SXDF is not a "markup language". It is not intended for data which
will be edited with a text editor.
A sequence of bytes (eight-bit octets) which satisfies the
requirements of this specification is called a "SXDF resource". Here
is an example:
483:# Here is some data in SXDF format
1%
8:Booklist=3@
5%
5:Title=16:Hardware Hacking
6:Author=19:Kevin Mitnick (Ed.)
4:Year=4:2004
4:ISBN=13:1-932-26683-6
9:Publisher=8:Syngress
5%
5:Title=12:We the Media
6:Author=11:Dan Gillmor
4:Year=4:2004
4:ISBN=13:0-596-00733-7
9:Publisher=8:O'Reilly
5%
5:Title=22:Matrix Decision Making
6:Author=21:Alex Lowy & Phil Hood
4:Year=4:2004
4:ISBN=13:0-787-97292-4
9:Publisher=11:Jossey-Bass
;
Here is the same data expressed in XML format:
<!-- Here is the same data expressed in XML markup -->
<Booklist>
<Book>
<Title>Hardware Hacking</title>
<Author>Kevin Mitnick (Ed.)</author>
<Year>2004</year>
<ISBN>1-932-26683-6</ISBN>
<Publisher>Syngress</publisher>
</Book>
<Book>
<Title>We the Media</title>
<Author>Dan Gillmor</author>
<Year>2004</year>
<ISBN>0-596-00733-7</ISBN>
<Publisher>O'Reilly</publisher>
</Book>
<Book>
<title>Matrix Decision Making</title>
<author>Alex Lowy & Phil Hood</author>
<year>2004</year>
<ISBN>0-787-97292-4</ISBN>
<publisher>Jossey-Bass</publisher>
</Book>
</Booklist>
Parsers for the SXDF format are generally less complicated and faster
than parsers for the XML format.
In addition, SXDF provides a general method for including digital
signatures.
1.2. Pronunciation
The acronym "SXDF" is pronounced like "sixdaf".
1.3. Notational conventions
1.3.1. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119
[KEYWORDS].
1.3.2. Syntactic notation
This syntax specification of SXDF in section 2, and the specification
of SXDF Document Structure Descriptions in section 3 uses the
Augmented Backus-Naur Form (ABNF) notation specified in [RFC2234].
2. Syntax specification
With "SXDF resource" we mean any sequence of bytes which matches
the production labeled "resource", below. With "byte" we mean an
octet of bits.
resource = nonnegint ":" *comment dictionary ";"
nonnegint = 1*digit
digit = %x30-39
The non-negative integer at the beginning of the resource MUST be
equal to the number of bytes between the ":" which follows it and
the ";" which ends the string. In this way, each SXDF resource
is a "netstring" as described in [Netstrings].
The SXDF resource MAY contain comments; if it does, the comments
MUST follow immediately after the initial colon.
comment = "#" *( %x00-09 / %x0B-FF ) %x0A
The fundamental SXDF data container is the "dictionary". It
contains "key = value" pairs which are called the elements of the
dictionary.
dictionary = nonnegint "%" line-end *( string "=" value line-end )
The initial non-negative integer of a dictionary MUST be equal to
the number of ( string "=" value line-end ) lines which the
dictionary contains.
In the dictionary, the strings which precede the "=" in each
( string "=" value line-end ) line are known as "keys". Any given
key MUST NOT occur more than once in the same dictionary.
line-end = %x0A *" "
In the line-end production, there MAY be any number of space
characters following the newline character. It is RECOMMENDED to
use a number of space characters which is equal to the number of
containing dictionary and sequence elements, as this improves
readability for humans.
string = nonnegint ":" *%x00-FF %x0A
The initial non-negative integer of a string MUST be equal to
the number of bytes between the ":" and the final newline character.
If the string contains textual data, it SHOULD be either in UTF-8
encoding or a sequence of 16-bit unicode characters that starts with
the unicode Byte Order Mark, 0xFE 0xFF or 0xFF 0xFE, depending on
endianness. (The case of UTF-8 encoding can be easily and reliably
distinguished from this, because in UTF-8, all bytes have values is
the range 0x00-0xFD.)
value = ( string / dictionary / sequence /
isequence / fsequence )
sequence = nonnegint "@" %x0A *( value line-end )
isequence = nonnegint "i" %x0A *( int line-end )
fsequence = nonnegint "f" %x0A *( float line-end )
In the sequence, isequence or fsequence production, the initial
non-negative integer MUST be equal to the number of following
( value line-end ) lines. The isequence and fsequence productions
are like a sequence with the difference that the values are
restricted to integer or floating-point numeric constants.
int = "0" / ( ["-"] %x31-39 *digit )
float = "0" / ( ["-"] ( "0" / %x31-39 *digit ) "." 1*digit
[ "e" int ] )
3. Use of SXDF for storing persistent data
Besides its use as an over-the-wire format for webservice protocols,
the SXDF data format MAY also be used as an on-disk format for
storing persistent data. However, programs which use SXDF for this
purpose SHOULD also support the EXDF data format [EXDF] which is
designed for allowing data resources to be edited conveniently in a
text editor.
4. Reserved Keywords
In the following two sections, special meanings will be defined for
the keywords "_DSD", "_DATA" and "_SIGNATURES". These words MUST NOT
be used as dictionary keys except as indicated below. The use of
any other dictionary keys which consists entirely of a "_" character
followed by three of more uppercase letters SHOULD be avoided. There
are two reasons for this: One is that there could be a need for
future revisions and extensions of this specification to introduce
additional reserved words. The second is that upon conversion of
SXDF resources to the editable EXDF data format, the reserved
keywords MAY be replaced by equivalent words in the user's preferred
language.
5. SXDF Data Structure Descriptions (DSD)
As mentioned in the introduction, SXDF data can be verified to adhere
to a specific data structure, similar to how this is possible with
XML.
A SXDF resource MAY contain a "_DSD" element, which, if present, is
an assertion that the SXDF resource confirms to a particular SXDF
Data Structure Description (DSD). The value of the _DSD element
SHOULD be either a string which references the DSD by means of an
URL, or a dictionary which contains the DSD explicitly.
A DSD is a list of two string elements, the first of which is a URI
and the second is the actual Data Structure Description in some
precisely-specified, human-readable description language. The
description language is identified by the URI in the first string.
Similar to the design criteria for programming languages, for such
description languages human readability is more important than
conciseness: It is a serious problem when DSDs are difficult to
read for the (very human) developers of computer programs which read
or write the corresponding SXDF-based data formats. When DSDs are
difficult to read, this work becomes needlessly unpleasant,
time-consuming and error-prone.
6. Digital Signatures
The SXDF format allows to add any number of digital signatures to
an entire SXDF resource or to any dictionary values in the resource.
Signing an entire SXDF resource is done by creating a new resource
with two elements named "_DATA" and "_SIGNATURES". The _DATA element
contains the contents of the original SXDF resource and the
_SIGNATURES element a list of digital signatures (see below).
Signing individual element values is done by replacing
the value with a dictionary that has precisely two elements with
the keys "_DATA" and "_SIGNATURES".
The set of DSDs to which the resource conforms is unchanged by this
operation, because from the perspective of data semantics and from
the perspective of DSDs the _SIGNATURES element is ignored and the
value of the _DATA element is treated as if it were in the place of
this two-element dictionary. DSDs will never reference any elements
named "_DATA" or "_SIGNATURES".
The _SIGNATURES element contains a list of one or more strings, with
each of the strings containing a digital signature in OpenPGP format,
as specified in [RFC 2440], of the value of the DATA element,
excluding the initial "4:DATA=" but including the final "\n".
These signatures are binary data for which the SXDF format has no
need of special encoding or "ASCII armoring".
Except if the _DATA element contains a string value, the signature
MUST be of type 0x00 ("signature of a binary document", see section
5.2.1 of [RFC2440]), and before signing, the contents of the _DATA
element MUST be canonicalized by removing any and all space
characters which follow a newline character but which are not part of
a string value. If the _DATA element contains a string value, other
signature types are also possible.
Here is an example of the canonicalization process: Suppose that
within a SXDF resource, a dictionary contains the following
_DATA dictionary:
5:_DATA=3%
6:Action=28:qrpc://example.com/add_books
10:ResourceID=28:34DH734HF64HF734@example.org
8:Booklist=3@
5%
5:Title=16:Hardware Hacking
6:Author=19:Kevin Mitnick (Ed.)
4:Year=4:2004
4:ISBN=13:1-932-26683-6
9:Publisher=8:Syngress
5%
5:Title=12:We the Media
6:Author=11:Dan Gillmor
4:Year=4:2004
4:ISBN=13:0-596-00733-7
9:Publisher=8:O'Reilly
5%
5:Title=22:Matrix Decision Making
6:Author=21:Alex Lowy & Phil Hood
4:Year=4:2004
4:ISBN=13:0-787-97292-4
9:Publisher=11:Jossey-Bass
Then this is the canonicalized version which will be signed:
3%
6:Action=28:qrpc://example.com/add_books
10:ResourceID=28:34DH734HF64HF734@example.org
8:Booklist=3@
5%
5:Title=16:Hardware Hacking
6:Author=19:Kevin Mitnick (Ed.)
4:Year=4:2004
4:ISBN=13:1-932-26683-6
9:Publisher=8:Syngress
5%
5:Title=12:We the Media
6:Author=11:Dan Gillmor
4:Year=4:2004
4:ISBN=13:0-596-00733-7
9:Publisher=8:O'Reilly
5%
5:Title=22:Matrix Decision Making
6:Author=21:Alex Lowy & Phil Hood
4:Year=4:2004
4:ISBN=13:0-787-97292-4
9:Publisher=11:Jossey-Bass
6. Security Considerations
Webservices typically act on untrusted data; SXDF implementations
therefore need to be carefully designed and reviewed to prevent
security breaches caused by improper handling of malformed SXDF
resources.
The universal data format described in this specification
incorporates a mechanism through which digital signatures can be
provided for subsets of the data. The security which may be
added through this mechanism depends on the strength of the
corresponding mechanisms for generating and verifying the
signatures and for establishing trust for the public keys which
correspond to the digital signatures.
7. IANA Considerations
This document has no actions for IANA.
References
Normative References
[KEYWORDS] Bradner, S., "Key words for use in RFCs to
Indicate Requirement Levels", BCP 14, RFC 2119.
[RFC2234] Crocker, D., Ed., "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234.
[RFC2440] Callas, J., Donnerhacke, L., Finney, H.,
Thayer, R. "OpenPGP Message Format", RFC 2440
Informative References
[Netstrings] Bernstein, D. J., "Netstrings"
<http://cr.yp.to/proto/netstrings.txt>
[EXDF] Bollow, N., "EXDF - Editable Extensible Data
Format", work in progress.
[QQP] Bollow, N., "QQP - Quick Queues Protocol",
work in progress.
<http://QQP.org/>
[QRPC] Bollow, N., "QRPC - Queueable Remote Procedure
Calls", work in progress.
<http://QRPC.org/>
[W3C.REC-xml] Bray, T., Paoli, J., Sperberg-McQueen, C. and
E. Maler, "Extensible Markup Language (XML) 1.0
(2nd ed)", W3C REC-xml, October 2000,
<http://www.w3.org/TR/REC-xml>.
Authors' Address
Norbert Bollow
Weidlistrasse 18
CH-8624 Gruet
Phone: +41 1 972 2059
EMail: nb@bollow.ch
Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all his rights.
This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.