INTERNET-DRAFT Greg Hudson Expires: October 22, 1999 ghudson@mit.edu MIT Simple Protocol Application Data Encoding draft-hudson-spade-00.txt 1. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Please send comments to ghudson@mit.edu. 2. Abstract This document describes a simple scheme for encoding network protocol data, and a simple notation for describing protocol data elements. All encodings are self-terminating (you know when you've reached the end) and assume that the decoder knows what type of protocol element it is expecting. 3. Encoding This encoding scheme uses the ASCII translation of characters into bytes except when otherwise noted. Protocol elements are encoded as follows: An integer: a sequence of decimal digits followed by a colon. For instance, the number 27 encodes as "27:". Negative integers are preceded by a minus sign, so -27 encodes as "-27:". Leading zeroes are not allowed (so that each integer has a unique encoding). A byte string: an integer giving the length of the string, followed by the bytes of the string itself. For instance, the string "foo" encodes as "3:foo". A string of wide characters must be first encoded as a byte string (using either 16-bit character values or UTF-8, for instance); how that is done is up to the protocol. A symbol: a sequence of letters, numbers, and dashes, beginning with a letter, followed by a colon. Case is significant. For instance, the symbol "foo" encodes as "foo:". A list of : an integer giving the number of elements in the list, followed by the elements of the list. For instance, the list of strings "a", "b", and "c" encodes as "3:1:a1:b1:c". A structure: a collection of dissimilar elements can simply be concatenated together. For instance, a structure containing the number 3 and the byte string "a" encodes as "3:1:a". A union: a symbol giving the type of element, an integer giving the length of the encoding of the element's data, and the data itself. For instance, an element of type "foo" with the same data as in the structure example above would be encoded as "foo:5:3:1:a". If there is no data to be encoded, a data length of 0 should be given, e.g. "bar:0:". 4. Notation This notation gives a scheme for describing protocol element types and giving them names for the purpose of semantic descriptions. A variable declaration associates a name to be used in semantic descriptions with a type. Variable names are valid symbols beginning with a lowercase letter. Variable declarations end with a line break, and are written as follows: An integer: "Integer " A byte string: "String " A symbol: "Symbol " A list of : "List[] " A structure named : " " A union named : " " Structure and union names are valid symbols beginning with a capital letter. A structure definition is written as: structure { . . . } Unions are defined as: union { : . . . } As a special case, if there is no data for a particular union tag, "Null" can be written in place of a variable declaration. Here is an example of two structure definitions which might be used to describe a mail message: structure Header { String name String value } structure Message { List[Header] headers String body } Here is an example of a union definition which might be used together with the above structure definitions to describe a command set: union Command { send: Message m help: Null quit: Null } A quit command would be encoded as "quit:0:". If I have a message with two headers, one with name "From" and value "Greg" and another with name "To" and value "Bob", and the message body is "Test", then I would encode a command to send this message as: send:19:2:4:From4:Greg2:To3:Bob4:Test 5. Rationale The primary goal of this encoding scheme is simplicity. For want of a simple encoding scheme, protocols have been turning to ASN.1's basic encoding rules, which are highly complicated and which have presented a barrier to implementation in practice. Two secondary goals of this encoding scheme are human readability and space efficiency. These goals are of course at odds; numbers could be encoded more compactly by using more than ten values per byte, for instance, at the expense of making it more difficult to examine ASCII translations of protocol data. The tagged union encoding provides easy extensibility in most protocols. A protocol can find the end of the encoding of a tagged union element even if it doesn't know the data types for the individual tags. This encoding does not include a length field for structures or an overall length field for lists. Thus, it is impossible to skip to the end of a structure or list without decoding it. This decision was a tradeoff; it simplifies encoding and uses space more efficiently in return for making certain decoding situations more complicated. 6. Security Considerations For maximum generality, this encoding scheme places no limits on the length of any data type. This could lead to denial of service attacks against implementations of protocols using this encoding ("here follows a string of length two gazillion"). It does not seem appropriate to choose limits in the wire encoding to prevent this sort of attack, so guarding against these attacks will have to be the responsibility of particular protocols or their implementations.