INTERNET-DRAFT Vijay Saraswat Expires: March 15, 2000 AT&T Shannon Laboratory Sep 15, 1999 The Payload Parameter Packaging Scheme draft-saraswat-payload-00.txt Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This document and related documents are discussed on the impp mailing list. To join the list, send mail to impp- request@iastate.edu. To contribute to the discussion, send mail to impp@iastate.edu. The archives are at http://lists.fsck.com/cgi- bin/wilma/pip. The IMPP working group charter, including the current list of group documents, can be found at http://www.ietf.org/html.charters/impp-charter.html. ABSTRACT This memo describes a simple way to package the parameters of a message in an Application Level Protocol. Parameters may be binary and of arbitrary length. 1. INTRODUCTION As application protocols on the Internet proliferate, a central problem continually resolved is that packaging of data for commands. MCP [MCP], Greg Hudson's Spade[SPADE], XDR [XDR], Rob Earhart's Application Core Protocol [ACP] etc are all concerned in part with this issue. The general framework for these application level protocols (FTP, IMAP, MSN-MESSENGER etc) is that one end-point (usually called the "Client") opens a reliable stream-oriented connection (usually TCP-based) to the other end-point (usually called the "Server"). Generally, either end-point may issue commands, which may result in one or more responses from the other end-point. A command is typically made up of an operation code (opcode) and zero or more parameters that carry data of a type appropriate for the opcode. This note describes a particularly simple scheme for packaging commands for transmission across the wireline. 2. ASSUMPTIONS The simplicity of the proposal arises from exploiting some assumptions: (1) A message consists of a sequence of bytes (e.g. 4 bytes) that describe the opcode, followed by a sequence of bytes that encode the parameters to the opcode. (2) The opcode describes the (1) the number and (2) sequence of parameters and (3) the mapping from the byte-stream representing the parameter to the high-level application data-type for the parameter. Therefore this information is available statically and does not need to be represented at run-time in the byte stream for the message. Note that (2) implies that messages do not support optional arguments. If the total number of arguments is small, an opcode with a variable number of arguments may be supported through a family of opcodes, each with a fixed number of arguments, one opcode for each selection of present/not present for each argument. An argument can be made that a message with an excessive (say > 10) number of optional parameters is overly complex and probably needs to be redesigned anyway. (2) also implies that messages do not support variable order of parameters. The value of variable parameter order in the context of wireline protocols seems limited however. The proposal in this note is not appropriate in situations in which messages must support a (large) variable number of parameters of variable types. The proposal in this note is not appropriate in situations in which a message is large and must be transmitted in pieces, which can be reassembled on the other side (e.g. as supported by MCP). However, it is possible to design variants of this proposal which will meet those requirements. (e.g. using continuation bits in all fragments but the last, and assembling fragments into an entire message before continuing with the parsing.) 3. REQUIREMENTS (1) A parameter is known statically to be either of fixed size or a size determined at run-time. However, in the latter case, the number of bytes used to represent the length of the parameter will be statically determined. (Usually two bytes are used, allowing a parameter value to occupy 64K bytes; but in some cases it may make sense to use one byte, or more than two bytes.) (2) A parameter may carry arbitrary binary data. The transfer mechanism should not impose its own encoding. No special value should be used to separate the byte stream for one parameter from the byte stream for the next. (3) Each endpoint may have multiple threads sending messages on the wire. 4. PROPOSAL The basic idea is to simply distinguish between fixed length parameters and variable length parameters and to represent variable length parameters by prefixing a fixed-size count of the length of the field. How is length to be represented? We propose two simple schemes, the FixedBound(K) scheme and the VariableBound scheme: FixedBound(K): The designer of the opcode specifies that parameter lengths will be encoded in a fixed number, K, of bytes. These bytes encode an integer N and are followed by N payload bytes encoding the value of the parameter. VariableBound: The designer of the opcode specifies that parameter lengths are encoded in (1 + K) bytes. The first byte encodes (K-1), (K-1 must be greater than or equal to 0) and subsequent K bytes encode an integer N, and are followed by N payload bytes encoding the value of the parameter. (Note the case in which N is zero is naturally encoded by setting K to 1, and having the length byte contain the value 0; i.e. the byte sequence for the parameter will be the two byte sequence 0x01 0x00. No payload bytes follow.) Thus a VariableBound parameter may have upto 256 bytes encoding the length of the parameter; thus the maximum payload size for this parameter is (256 exp 255). Thus the proposal is simply to send data across in "length value" format, with the payloads for consecutive data fields simply concatenated. It may be summarized by the following BNF rules: Message ::= Opcode VariableChunk VariableChunk ::= Length Data | Length Data ::= epsilon | FixedChunk Data | VariableChunk Data FixedChunk ::= Length ::= Note that arbitrary nesting of VariableChunks is intended; the scheme is naturally recursive. One may think of each parameter as being a structure, and allow a structure to contain, recursively, other structures so that nested lists of lists of ... can be passed as values. 5. SUPPORTED TYPE STRUCTURE This can be expressed formally by defining a little type-structure for messages. FixedLengthBaseType ::= Integer | Boolean | Real | .... VariableLengthBaseType ::= String | UninterpretedByteStream | ... Type ::= FixedLengthBaseType | VariableLengthBaseType | {Type_1 Type_2 ... Type_n} (n >= 1) | Type* Here, the type {Type_1 Type_2 ... Type_n} containing N components represents a structure with N components, each of the given type, e.g. {Boolean Integer Integer} describes a structure with three components. The type Type* represents a variable-sized list of elements of the given type, e.g. Integer* is a list of integers, and {Boolean Integer Integer}* represents a list whose byte stream should be parsed as a list of Booleans, Integers and Integers. Let v be a value of type T, where T is defined according to the above rules. Then the payload corresponding to v, p(v) can be defined as follows by induction on the type structure: FixedLengthBaseType Let the type of v be a FixedLengthBaseType. Then p(v) is simply the sequence of bytes making up the value. VariableLengthBaseType Let the type of v be a VariableLengthBaseType. Then p(v) is simply the bytes encoding the lenght of v (per the Bound encoding scheme in effect), followed by the bytes representing the underlying data. {Type_1 Type2 ... TypeN} Let the type of v be {Type_1 Type_2 ... Type_N}. Then v is a structure with N components, say v_1, ..., v_n, where the i'th component v_i is of type Type_I. Obtain the representation of v by concatenating the bytes corresponding to p(v_i) in the given sequence, and prepend with bytes representing the length of the entire sequence, using the Bound encoding scheme in effect. Type* Let the type of v be Type*. Then v is a list of values v_1, ..., v_n, where each value is of type Type. Obtain the representation of v by concatenating the bytes corresponding to p(v_i) in the given sequence, and prepend with bytes representing the length of the entire sequence, using the Bound encoding scheme in effect. LEMMA: (Length Lemma) For any value v of type T, the length of the byte stream representing the data for v can be recovered unambiguously from the first few bytes of p(v). PROOF: Straighforward, by induction. In the base case, the length is recovered because it is known statically (FixedLengthBaseType), or because it is encoded in the first few bytes of the payload (VariableLengthBasteType). In the recursive case, the length is encoded explicitly in the first few bytes, as specified by the Bound encoding scheme in effect. END Now given a type T and a payload p, where p equals the payload p(v) generated from a value v of type T, the question is whether v can be recovered unambiguously. THEOREM: (Reconstruction theorem) Given a type T and a value v of that type, the value may be reconstructed unambiguously from p(v). PROOF: Straighforward, by induction. The base cases are straightforward. Now suppose v is of type {Type_1, ..., Type_n}, and has components v_1, ..., v_n. Now by the Length Lemma, one can determine the length of p(v) from the first few bytes of p(v). What follows must be the payload for v. However, we know that the payload for v consists of the concatenation of p(v_1), ..., p(v_n). From the Lenght Lemma, the first few bytes of p(v_1) determine the length of p(v_1). Hence we can read off that many bytes and by induction recover unambiguously the value of v_1. We repeat this process for the remaining bytes, thus reading off the values in succession, until we have read n values. A similar technique works for decoding values of type Type*. In this case the decoding process keeps going until there are no more byes. END 6. PROTOCOL DESIGN CONSIDERATIONS Designers of application protocols may find it convenient to reserve a few "length" bits (e.g. the last two) in the opcode to specify the length-encoding scheme used in the opcode. A possible assignment of these bits would be: 00 Lengths in this message are encoded with the FixedBound(1) scheme. 01 Lengths in this message are encoded with the FixedBound(2) scheme. 10 Lengths in this message are encoded with the FixedBound(3) scheme. 11 Lengths in this message are encoded with the VariableBound scheme. Alternatively, the designer of the protocol may make the commitment that a FixedBound(K) encoding scheme for a particular value of K (say 2) will be used for all messages in the protocol. For such a protocol, low-level network code can assemble the byte stream for a particular message uniformly by reading the first m bytes (usually m=4) to get the opcode, and then reading K bytes to get N, and then reading N bytes of data. Specifically, the network code will not need to perform some runtime computation on the opcode to determine the number of subsequent length bytes it should read. 7. EXAMPLES (0) Consider the opcode ADD_BUDDY which takes two parameters, a subscriber and a buddy, each of which is represented in four bytes. The type structure for this opcode is thus: {Integer Integer} In this case, the Protocol Designer will set the length bits for the opcode to 00. A message will now be encoded as: 0x08 (with the last two bits of being 00). Alternatively, the Protocol Designer may decide that all messages in the protocol will use FixedBound(2) length encoding, and dispense with encoding length bits in the opcode. In this case the message will be encoded as: 0x00 0x08 (1) Consider the opcode SEND_IM which takes three parameters, a subscriber and a recipient (both of which can be represented in four bytes), and a text field of at most 64K in length. The type structure of the opcode is thus: {Integer Integer String} In this case, the Protocol Designer will set the length bits for the opcode to 01. A message will now be encoded as: (with the last two bits of being 00). Note that the length of the string, N, is bounded above by 64K - 10, since the length of the entire packet, 10+N is bounded above by 64K. (2) Consider the opcode SYNC which takes two arguments, the list of ids of active users, and a corresponding list of IP addresses. An active user id and an IP address can be represented in 4 bytes. Each list should not have more than a few hundred entries. The type structure of the opcode is thus: {{Integer}* {Integer}*} In this case, the Protocol Designer will set the length bits for the opcode to 01. A message will now be encoded as: -- The N bytes are parsed in chunks of 4bytes each. -- The M bytes are parsed in chunks of 4bytes each. As above, the operational constraint here is that 4 + M + N <= 64K. If we assume M = N, the bound on the number of entries in each list is roughly 8K. 8. ACKNOWLEDGEMENTS Thanks to Erik Ostrom and Greg Hudson for valuable comments on an earlier draft. 9. REFERENCES [MCP] Carlson et al "The MUD Client Protocol, Version 2.1", http://www.moo.mud.org/mcp2/mcp2.html [SPADE] Greg Hudson, "Simple Protocol Application Data Encoding", Internet Draft draft-hudson-spade-03.txt, expires March 1, 2000, Work In Progress. [XDR] Raj Srinivasan "XDR: External Data Representatoin Standard", RFC 1832, August 1995. [ACP] Rob Earhart, "Application Core Protocol", Internet Draft draft-earhart-acp-spec-00.txt, expires July 1999, Work In Progress. 10. AUTHOR's ADDRESS Vijay Saraswat AT&T Laboratories Shannon Laboratory 180 Park Avenue Florham Park, NJ 07932 vj@research.att.com