INTERNET-DRAFT                                             Vijay Saraswat
Expires: March 15, 2000                           AT&T Shannon Laboratory
                                                             Sep 15, 1999

               The Payload Parameter Packaging Scheme
                  draft-saraswat-payload-00.txt

Status of This Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This document and related documents are discussed on the impp mailing
list. To join the list, send mail to impp- request@iastate.edu. To
contribute to the discussion, send mail to impp@iastate.edu. The
archives are at http://lists.fsck.com/cgi- bin/wilma/pip. The IMPP
working group charter, including the current list of group documents,
can be found at http://www.ietf.org/html.charters/impp-charter.html.

ABSTRACT

This memo describes a simple way to package the parameters of a
message in an Application Level Protocol. Parameters may be binary and
of arbitrary length.

1. INTRODUCTION

As application protocols on the Internet proliferate, a central
problem continually resolved is that packaging of data for commands.
MCP [MCP], Greg Hudson's Spade[SPADE], XDR [XDR], Rob Earhart's
Application Core Protocol [ACP] etc are all concerned in part with
this issue.

The general framework for these application level protocols (FTP,
IMAP, MSN-MESSENGER etc) is that one end-point (usually called the
"Client") opens a reliable stream-oriented connection (usually
TCP-based) to the other end-point (usually called the
"Server"). Generally, either end-point may issue commands, which may
result in one or more responses from the other end-point. A command is
typically made up of an operation code (opcode) and zero or more
parameters that carry data of a type appropriate for the opcode.

This note describes a particularly simple scheme for packaging
commands for transmission across the wireline.

2. ASSUMPTIONS

The simplicity of the proposal arises from exploiting some
assumptions:

(1) A message consists of a sequence of bytes (e.g. 4 bytes) that
    describe the opcode, followed by a sequence of bytes that
    encode the parameters to the opcode.

(2) The opcode describes the (1) the number and (2) sequence of
    parameters and (3) the mapping from the byte-stream representing
    the parameter to the high-level application data-type for the
    parameter.

    Therefore this information is available statically and does not
    need to be represented at run-time in the byte stream for the
    message.

Note that (2) implies that messages do not support optional
arguments. If the total number of arguments is small, an opcode with a
variable number of arguments may be supported through a family of
opcodes, each with a fixed number of arguments, one opcode for
each selection of present/not present for each argument. An argument
can be made that a message with an excessive (say > 10) number of
optional parameters is overly complex and probably needs to be
redesigned anyway.

(2) also implies that messages do not support variable order of
parameters. The value of variable parameter order in the context of
wireline protocols seems limited however.

The proposal in this note is not appropriate in situations in which
messages must support a (large) variable number of parameters of
variable types.

The proposal in this note is not appropriate in situations in which a
message is large and must be transmitted in pieces, which can be
reassembled on the other side (e.g. as supported by MCP). However, it
is possible to design variants of this proposal which will meet those
requirements. (e.g. using continuation bits in all fragments but the
last, and assembling fragments into an entire message before
continuing with the parsing.)

3. REQUIREMENTS

(1) A parameter is known statically to be either of fixed size or a
    size determined at run-time. However, in the latter case, the
    number of bytes used to represent the length of the parameter will
    be statically determined. (Usually two bytes are used, allowing a
    parameter value to occupy 64K bytes; but in some cases it may make
    sense to use one byte, or more than two bytes.)

(2) A parameter may carry arbitrary binary data. The transfer
    mechanism should not impose its own encoding. No special value
    should be used to separate the byte stream for one parameter from
    the byte stream for the next.

(3) Each endpoint may have multiple threads sending messages on the
    wire.

4. PROPOSAL

The basic idea is to simply distinguish between fixed length
parameters and variable length parameters and to represent variable
length parameters by prefixing a fixed-size count of the length of the
field.

How is length to be represented? We propose two simple schemes, the
FixedBound(K) scheme and the VariableBound scheme:

  FixedBound(K): The designer of the opcode specifies that parameter
    lengths will be encoded in a fixed number, K, of bytes. These
    bytes encode an integer N and are followed by N payload bytes
    encoding the value of the parameter.

  VariableBound: The designer of the opcode specifies that parameter
  lengths are encoded in (1 + K) bytes. The first byte encodes (K-1),
  (K-1 must be greater than or equal to 0) and subsequent K bytes
  encode an integer N, and are followed by N payload bytes encoding
  the value of the parameter. (Note the case in which N is zero is
  naturally encoded by setting K to 1, and having the length byte
  contain the value 0; i.e. the byte sequence for the parameter will
  be the two byte sequence 0x01 0x00. No payload bytes follow.)

  Thus a VariableBound parameter may have upto 256 bytes encoding the
  length of the parameter; thus the maximum payload size for this
  parameter is (256 exp 255).

Thus the proposal is simply to send data across in "length value"
format, with the payloads for consecutive data fields simply
concatenated. It may be summarized by the following BNF rules:

  Message ::= Opcode VariableChunk
  VariableChunk ::=  Length Data | Length <uninterpreted bytes>
  Data ::= epsilon | FixedChunk Data | VariableChunk Data
  FixedChunk ::= <fixed number of uninterpeted bytes>
  Length ::= <K bytes, to be interpreted as an integer, K determined by
scheme>

Note that arbitrary nesting of VariableChunks is intended; the scheme
is naturally recursive. One may think of each parameter as being a
structure, and allow a structure to contain, recursively, other
structures so that nested lists of lists of ... can be passed as
values.

5. SUPPORTED TYPE STRUCTURE

This can be expressed formally by defining a little type-structure for
messages.

  FixedLengthBaseType ::=
      Integer
    | Boolean
    | Real
    | ....

 VariableLengthBaseType ::=
      String
    | UninterpretedByteStream
    | ...

 Type ::=
      FixedLengthBaseType
    | VariableLengthBaseType
    | {Type_1 Type_2 ... Type_n}  (n >= 1)
    | Type*

Here, the type {Type_1 Type_2 ... Type_n} containing N components
represents a structure with N components, each of the given type,
e.g. {Boolean Integer Integer} describes a structure with three
components. The type Type* represents a variable-sized list of
elements of the given type, e.g. Integer* is a list of integers, and
{Boolean Integer Integer}* represents a list whose byte stream should
be parsed as a list of Booleans, Integers and Integers.

Let v be a value of type T, where T is defined according to the above
rules. Then the payload corresponding to v, p(v) can be defined as
follows by induction on the type structure:

  FixedLengthBaseType

     Let the type of v be a FixedLengthBaseType. Then p(v) is simply
     the sequence of bytes making up the value.

  VariableLengthBaseType

     Let the type of v be a VariableLengthBaseType. Then p(v) is
     simply the bytes encoding the lenght of v (per the Bound encoding
     scheme in effect), followed by the bytes representing the
     underlying data.

 {Type_1 Type2 ... TypeN}

     Let the type of v be {Type_1 Type_2 ... Type_N}.  Then v is a
     structure with N components, say v_1, ..., v_n, where the i'th
     component v_i is of type Type_I. Obtain the representation of v
     by concatenating the bytes corresponding to p(v_i) in the given
     sequence, and prepend with bytes representing the length of the
     entire sequence, using the Bound encoding scheme in effect.

  Type*

     Let the type of v be Type*. Then v is a list of values v_1, ...,
     v_n, where each value is of type Type. Obtain the representation
     of v by concatenating the bytes corresponding to p(v_i) in the
     given sequence, and prepend with bytes representing the length of
     the entire sequence, using the Bound encoding scheme in effect.

LEMMA: (Length Lemma) For any value v of type T, the length of the
  byte stream representing the data for v can be recovered
  unambiguously from the first few bytes of p(v).

PROOF: Straighforward, by induction. In the base case, the length is
  recovered because it is known statically (FixedLengthBaseType), or
  because it is encoded in the first few bytes of the payload
  (VariableLengthBasteType). In the recursive case, the length is
  encoded explicitly in the first few bytes, as specified by the Bound
  encoding scheme in effect.
END

Now given a type T and a payload p, where p equals the payload p(v)
generated from a value v of type T, the question is whether v can be
recovered unambiguously.

THEOREM: (Reconstruction theorem)

  Given a type T and a value v of that type, the value may be
  reconstructed unambiguously from p(v).

PROOF: Straighforward, by induction. The base cases are
  straightforward.

  Now suppose v is of type {Type_1, ..., Type_n}, and has components
  v_1, ..., v_n. Now by the Length Lemma, one can determine the length
  of p(v) from the first few bytes of p(v). What follows must be the
  payload for v. However, we know that the payload for v consists of
  the concatenation of p(v_1), ..., p(v_n). From the Lenght Lemma, the
  first few bytes of p(v_1) determine the length of p(v_1). Hence we
  can read off that many bytes and by induction recover unambiguously
  the value of v_1. We repeat this process for the remaining bytes,
  thus reading off the values in succession, until we have read n
  values.

  A similar technique works for decoding values of type Type*. In this
  case the decoding process keeps going until there are no more byes.

END

6. PROTOCOL DESIGN CONSIDERATIONS

Designers of application protocols may find it convenient to reserve a
few "length" bits (e.g. the last two) in the opcode to specify the
length-encoding scheme used in the opcode.  A possible assignment of
these bits would be:

    00   Lengths in this message are encoded with the FixedBound(1) scheme.
    01   Lengths in this message are encoded with the FixedBound(2) scheme.
    10   Lengths in this message are encoded with the FixedBound(3) scheme.
    11   Lengths in this message are encoded with the VariableBound scheme.

Alternatively, the designer of the protocol may make the commitment
that a FixedBound(K) encoding scheme for a particular value of K (say
2) will be used for all messages in the protocol. For such a protocol,
low-level network code can assemble the byte stream for a particular
message uniformly by reading the first m bytes (usually m=4) to get
the opcode, and then reading K bytes to get N, and then reading N
bytes of data. Specifically, the network code will not need to perform
some runtime computation on the opcode to determine the number of
subsequent length bytes it should read.

7. EXAMPLES

(0) Consider the opcode ADD_BUDDY which takes two parameters, a
    subscriber and a buddy, each of which is represented in four
    bytes. The type structure for this opcode is thus:

      {Integer Integer}

    In this case, the Protocol Designer will set the length bits for
    the opcode to 00. A message will now be encoded as:

    <opcode> 0x08 <FixedChunk:4bytes><FixedChunk:4bytes>

    (with the last two bits of <opcode> being 00).

    Alternatively, the Protocol Designer may decide that all messages
    in the protocol will use FixedBound(2) length encoding, and
    dispense with encoding length bits in the opcode. In this case
    the message will be encoded as:

    <opcode> 0x00 0x08 <FixedChunk:4bytes><FixedChunk:4bytes>

(1) Consider the opcode SEND_IM which takes three parameters, a
    subscriber and a recipient (both of which can be represented in
    four bytes), and a text field of at most 64K in length. The type
    structure of the opcode is thus:

       {Integer Integer String}

    In this case, the Protocol Designer will set the length bits for
    the opcode to 01. A message will now be encoded as:

    <opcode>
    <Length: 2 bytes encoding (10 + N)>
    <FixedChunk:4bytes><FixedChunk:4bytes>
    <Length: 2 bytes encoding N><uninterpreted data: N bytes>

    (with the last two bits of <opcode> being 00).

    Note that the length of the string, N, is bounded above by 64K -
    10, since the length of the entire packet, 10+N is bounded above
    by 64K.

(2) Consider the opcode SYNC which takes two arguments, the list of
    ids of active users, and a corresponding list of IP addresses. An
    active user id and an IP address can be represented in 4
    bytes. Each list should not have more than a few hundred
    entries. The type structure of the opcode is thus:

        {{Integer}* {Integer}*}

    In this case, the Protocol Designer will set the length bits for
    the opcode to 01. A message will now be encoded as:

    <opcode>
    <Length: 2bytes encoding (4+M+N)>
    <Length:2 bytes encoding N>
    <Data: N bytes> -- The N bytes are parsed in chunks of 4bytes each.
    <Length:2 bytes encoding M>
    <Data: M bytes> -- The M bytes are parsed in chunks of 4bytes each.

  As above, the operational constraint here is that 4 + M + N <=
  64K. If we assume M = N, the bound on the number of entries in each
  list is roughly 8K.

8. ACKNOWLEDGEMENTS

Thanks to Erik Ostrom and Greg Hudson for valuable comments on an
earlier draft.

9. REFERENCES

 [MCP] Carlson et al "The MUD Client Protocol, Version 2.1",
  http://www.moo.mud.org/mcp2/mcp2.html

 [SPADE] Greg Hudson, "Simple Protocol Application Data Encoding",
 Internet Draft draft-hudson-spade-03.txt, expires March 1, 2000, Work
 In Progress.

 [XDR] Raj Srinivasan "XDR: External Data Representatoin Standard",
 RFC 1832, August 1995.

 [ACP] Rob Earhart, "Application Core Protocol", Internet Draft
 draft-earhart-acp-spec-00.txt, expires July 1999, Work In Progress.

10. AUTHOR's ADDRESS

  Vijay Saraswat
  AT&T Laboratories
  Shannon Laboratory
  180 Park Avenue
  Florham Park, NJ 07932

  vj@research.att.com