Norbert Bollow Internet-Draft TenthNet Project Expires: May 22, 2006 November 19, 2005 SXDF - Simple Extensible Data Format Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 22, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract The Simple Extensible Data Format (SXDF) defined in this document aims to combine the nice properties of XML (of providing a universal, text-based data format which allows adding additional data fields without breaking existing application programs) with a simple syntax which can be parsed efficiently by computer programs. This data format is intended for over-the-wire use in webservice protocols, where there is generally no interest in being able to directly modify the representation of the data with a standard text editor. 1. Introduction 1.1. Overview Over the past few years, the Extensible Markup Language (XML) [W3C.REC-xml] has become a widely used method for data markup. The Simple Extensible Data Format (SXDF) defined in this document aims to combine the nice properties of XML with a simple syntax which can be parsed efficiently by computer programs. SXDF shares the following good properties of XML: o It is a universal data format which can be used for expressing arbitrarily complex data. o It is a text-based format, which makes it more convenient to debug protocol interactions which use the data format. o Data can be validated in an automated manner to ensure that it adheres to a specified data structure. o There is great flexibility in how the data format used by a given protocol can be extended without breaking existing implementations of the protocol. SXDF differs from XML in that with SXDF the main design goals are simplicity, and allowing efficient parsing by computer programs. SXDF is not a "markup language". It is not intended for data which will be edited with a text editor. A sequence of bytes (eight-bit octets) which satisfies the requirements of this specification is called a "SXDF resource". Here is an example: 483:# Here is some data in SXDF format 1% 8:Booklist=3@ 5% 5:Title=16:Hardware Hacking 6:Author=19:Kevin Mitnick (Ed.) 4:Year=4:2004 4:ISBN=13:1-932-26683-6 9:Publisher=8:Syngress 5% 5:Title=12:We the Media 6:Author=11:Dan Gillmor 4:Year=4:2004 4:ISBN=13:0-596-00733-7 9:Publisher=8:O'Reilly 5% 5:Title=22:Matrix Decision Making 6:Author=21:Alex Lowy & Phil Hood 4:Year=4:2004 4:ISBN=13:0-787-97292-4 9:Publisher=11:Jossey-Bass ; Here is the same data expressed in XML format: Hardware Hacking Kevin Mitnick (Ed.) 2004 1-932-26683-6 Syngress We the Media Dan Gillmor 2004 0-596-00733-7 O'Reilly Matrix Decision Making Alex Lowy & Phil Hood 2004 0-787-97292-4 Jossey-Bass Parsers for the SXDF format are generally less complicated and faster than parsers for the XML format. In addition, SXDF provides a general method for including digital signatures. 1.2. Pronunciation The acronym "SXDF" is pronounced like "sixdaf". 1.3. Notational conventions 1.3.1. Requirements notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [KEYWORDS]. 1.3.2. Syntactic notation This syntax specification of SXDF in section 2, and the specification of SXDF Document Structure Descriptions in section 3 uses the Augmented Backus-Naur Form (ABNF) notation specified in [RFC2234]. 2. Syntax specification With "SXDF resource" we mean any sequence of bytes which matches the production labeled "resource", below. With "byte" we mean an octet of bits. resource = nonnegint ":" *comment dictionary ";" nonnegint = 1*digit digit = %x30-39 The non-negative integer at the beginning of the resource MUST be equal to the number of bytes between the ":" which follows it and the ";" which ends the string. In this way, each SXDF resource is a "netstring" as described in [Netstrings]. The SXDF resource MAY contain comments; if it does, the comments MUST follow immediately after the initial colon. comment = "#" *( %x00-09 / %x0B-FF ) %x0A The fundamental SXDF data container is the "dictionary". It contains "key = value" pairs which are called the elements of the dictionary. dictionary = nonnegint "%" line-end *( string "=" value line-end ) The initial non-negative integer of a dictionary MUST be equal to the number of ( string "=" value line-end ) lines which the dictionary contains. In the dictionary, the strings which precede the "=" in each ( string "=" value line-end ) line are known as "keys". Any given key MUST NOT occur more than once in the same dictionary. line-end = %x0A *" " In the line-end production, there MAY be any number of space characters following the newline character. It is RECOMMENDED to use a number of space characters which is equal to the number of containing dictionary and sequence elements, as this improves readability for humans. string = nonnegint ":" *%x00-FF %x0A The initial non-negative integer of a string MUST be equal to the number of bytes between the ":" and the final newline character. If the string contains textual data, it SHOULD be either in UTF-8 encoding or a sequence of 16-bit unicode characters that starts with the unicode Byte Order Mark, 0xFE 0xFF or 0xFF 0xFE, depending on endianness. (The case of UTF-8 encoding can be easily and reliably distinguished from this, because in UTF-8, all bytes have values is the range 0x00-0xFD.) value = ( string / dictionary / sequence / isequence / fsequence ) sequence = nonnegint "@" %x0A *( value line-end ) isequence = nonnegint "i" %x0A *( int line-end ) fsequence = nonnegint "f" %x0A *( float line-end ) In the sequence, isequence or fsequence production, the initial non-negative integer MUST be equal to the number of following ( value line-end ) lines. The isequence and fsequence productions are like a sequence with the difference that the values are restricted to integer or floating-point numeric constants. int = "0" / ( ["-"] %x31-39 *digit ) float = "0" / ( ["-"] ( "0" / %x31-39 *digit ) "." 1*digit [ "e" int ] ) 3. Use of SXDF for storing persistent data Besides its use as an over-the-wire format for webservice protocols, the SXDF data format MAY also be used as an on-disk format for storing persistent data. However, programs which use SXDF for this purpose SHOULD also support the EXDF data format [EXDF] which is designed for allowing data resources to be edited conveniently in a text editor. 4. Reserved Keywords In the following two sections, special meanings will be defined for the keywords "_DSD", "_DATA" and "_SIGNATURES". These words MUST NOT be used as dictionary keys except as indicated below. The use of any other dictionary keys which consists entirely of a "_" character followed by three of more uppercase letters SHOULD be avoided. There are two reasons for this: One is that there could be a need for future revisions and extensions of this specification to introduce additional reserved words. The second is that upon conversion of SXDF resources to the editable EXDF data format, the reserved keywords MAY be replaced by equivalent words in the user's preferred language. 5. SXDF Data Structure Descriptions (DSD) As mentioned in the introduction, SXDF data can be verified to adhere to a specific data structure, similar to how this is possible with XML. A SXDF resource MAY contain a "_DSD" element, which, if present, is an assertion that the SXDF resource confirms to a particular SXDF Data Structure Description (DSD). The value of the _DSD element SHOULD be either a string which references the DSD by means of an URL, or a dictionary which contains the DSD explicitly. A DSD is a list of two string elements, the first of which is a URI and the second is the actual Data Structure Description in some precisely-specified, human-readable description language. The description language is identified by the URI in the first string. Similar to the design criteria for programming languages, for such description languages human readability is more important than conciseness: It is a serious problem when DSDs are difficult to read for the (very human) developers of computer programs which read or write the corresponding SXDF-based data formats. When DSDs are difficult to read, this work becomes needlessly unpleasant, time-consuming and error-prone. 6. Digital Signatures The SXDF format allows to add any number of digital signatures to an entire SXDF resource or to any dictionary values in the resource. Signing an entire SXDF resource is done by creating a new resource with two elements named "_DATA" and "_SIGNATURES". The _DATA element contains the contents of the original SXDF resource and the _SIGNATURES element a list of digital signatures (see below). Signing individual element values is done by replacing the value with a dictionary that has precisely two elements with the keys "_DATA" and "_SIGNATURES". The set of DSDs to which the resource conforms is unchanged by this operation, because from the perspective of data semantics and from the perspective of DSDs the _SIGNATURES element is ignored and the value of the _DATA element is treated as if it were in the place of this two-element dictionary. DSDs will never reference any elements named "_DATA" or "_SIGNATURES". The _SIGNATURES element contains a list of one or more strings, with each of the strings containing a digital signature in OpenPGP format, as specified in [RFC 2440], of the value of the DATA element, excluding the initial "4:DATA=" but including the final "\n". These signatures are binary data for which the SXDF format has no need of special encoding or "ASCII armoring". Except if the _DATA element contains a string value, the signature MUST be of type 0x00 ("signature of a binary document", see section 5.2.1 of [RFC2440]), and before signing, the contents of the _DATA element MUST be canonicalized by removing any and all space characters which follow a newline character but which are not part of a string value. If the _DATA element contains a string value, other signature types are also possible. Here is an example of the canonicalization process: Suppose that within a SXDF resource, a dictionary contains the following _DATA dictionary: 5:_DATA=3% 6:Action=28:qrpc://example.com/add_books 10:ResourceID=28:34DH734HF64HF734@example.org 8:Booklist=3@ 5% 5:Title=16:Hardware Hacking 6:Author=19:Kevin Mitnick (Ed.) 4:Year=4:2004 4:ISBN=13:1-932-26683-6 9:Publisher=8:Syngress 5% 5:Title=12:We the Media 6:Author=11:Dan Gillmor 4:Year=4:2004 4:ISBN=13:0-596-00733-7 9:Publisher=8:O'Reilly 5% 5:Title=22:Matrix Decision Making 6:Author=21:Alex Lowy & Phil Hood 4:Year=4:2004 4:ISBN=13:0-787-97292-4 9:Publisher=11:Jossey-Bass Then this is the canonicalized version which will be signed: 3% 6:Action=28:qrpc://example.com/add_books 10:ResourceID=28:34DH734HF64HF734@example.org 8:Booklist=3@ 5% 5:Title=16:Hardware Hacking 6:Author=19:Kevin Mitnick (Ed.) 4:Year=4:2004 4:ISBN=13:1-932-26683-6 9:Publisher=8:Syngress 5% 5:Title=12:We the Media 6:Author=11:Dan Gillmor 4:Year=4:2004 4:ISBN=13:0-596-00733-7 9:Publisher=8:O'Reilly 5% 5:Title=22:Matrix Decision Making 6:Author=21:Alex Lowy & Phil Hood 4:Year=4:2004 4:ISBN=13:0-787-97292-4 9:Publisher=11:Jossey-Bass 6. Security Considerations Webservices typically act on untrusted data; SXDF implementations therefore need to be carefully designed and reviewed to prevent security breaches caused by improper handling of malformed SXDF resources. The universal data format described in this specification incorporates a mechanism through which digital signatures can be provided for subsets of the data. The security which may be added through this mechanism depends on the strength of the corresponding mechanisms for generating and verifying the signatures and for establishing trust for the public keys which correspond to the digital signatures. 7. IANA Considerations This document has no actions for IANA. References Normative References [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119. [RFC2234] Crocker, D., Ed., "Augmented BNF for Syntax Specifications: ABNF", RFC 2234. [RFC2440] Callas, J., Donnerhacke, L., Finney, H., Thayer, R. "OpenPGP Message Format", RFC 2440 Informative References [Netstrings] Bernstein, D. J., "Netstrings" [EXDF] Bollow, N., "EXDF - Editable Extensible Data Format", work in progress. [QQP] Bollow, N., "QQP - Quick Queues Protocol", work in progress. [QRPC] Bollow, N., "QRPC - Queueable Remote Procedure Calls", work in progress. [W3C.REC-xml] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. Maler, "Extensible Markup Language (XML) 1.0 (2nd ed)", W3C REC-xml, October 2000, . Authors' Address Norbert Bollow Weidlistrasse 18 CH-8624 Gruet Phone: +41 1 972 2059 EMail: nb@bollow.ch Full Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all his rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society.