Network Working Group                                      Max Wildgrube
INTERNET-DRAFT
Category: Informational                                    February 1999
Expiration Date: August 2000

             STRUCTURED DATA EXCHANGE FORMAT (SDXF)

FILED AS: draft-wildgrube-sdxf-01.txt

STATUS OF THIS MEMO:

   This document is an Internet-Draft and is in full  conformance
   with all provisions of Section 10 of RFC2026.

   Internet-Drafts  are  working  documents   of   the   Internet
   Engineering  Task  Force  (IETF),  its  areas, and its working
   groups.  Note that other groups may  also  distribute  working
   documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months  and  may  be  updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to  use  Internet-
   Drafts  as  reference  material  or to cite them other than as
   "work in progress."

   The  list  of  current  Internet-Drafts  can  be  accessed  at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be  accessed
   at http://www.ietf.org/shadow.html.


   Please send your comments to the author: max@wildgrube.com

ABSTRACT:

   This specification describes an all-purpose interchange format
   for  use  as  a  file  format  or  for  net-working.  Data  is
   organinzed in chunks which  can  be  ordered  in  hierarchical
   structures.    This   format   is   self-describing  and  cpu-
   independent.


CONTENTS:

   1. Introduction
   2. Description of SDXF data format.
   3. Introduction to the SDXF functions
   4. Platform independence
   5. Compression
   6. Encryption
   7. Description of the SDXF functions
   8. Security Considerations
   9. Remarks
   10. Author's Address


1. Introduction

   The purpose of the Structured Data eXchange Format  (SDXF)  is
   to  permit  the  interchange  of  an arbitrary structured data
   block  with  different  kinds  of   data   (numerical,   text,
   bitstrings).    This   data  format  is  not  limited  to  any
   application, the demand for this format is that it  is  usable
   as  a  text format for word-processing, as a picture format, a
   sound  format,  for  remote  procedure  calls   with   complex
   parameters,  suitable  for document formats, for interchanging
   business data, etc.

   SDXF is self-describing, every program can unpack every  SDXF-
   data  without  knowing  the  meaning  of  the  individual data
   elements.

   Together with the description of the  data  format  a  set  of
   functions will be introduced. With the help of these functions
   one can create and access the data elements of SDXF.  The idea
   is  that  a programmer should only use these functions instead
   of maintaining the structure by himself on the level  of  bits
   and  bytes.   (In  the  speach  of object-oriented programming
   these functions are methods of an  object  which  works  as  a
   handle for a given SDXF data block)

   SDXF is not limited on  a  specific  platform,  along  with  a
   correct preparation of the SDXF functions the SDXF data can be
   interchanged  (via  network  or  data  carrier)   across   the
   bounderies   of  different  architectures  (specified  by  the
   character code like ASCII, ANSI or EBCDIC and the  byte  order
   for binary data)

   SDXF is also prepared to compress and  encrypt  parts  or  the
   whole block of SDXF data.


2. Description of SDXF data format.

   2.1 First we introduce the term "chunk". A  chunk  is  a  data
   structure  with  a  fixed  set  of  components. A chunk may be
   "elementary" or "structured". The latter one contains  itselfs
   one or more other chunks.

   2.2 A chunk consists of a header and the data body (content):

   +----------+--------+-------+-----------------------------------+
   | Name     | Pos.   | Length| Description                       |
   +----------+--------+-------+-----------------------------------+
   | chunk-ID |  1     |   2   | ID of the chunk (unsigned short)  |
   | flags    |  3     |   1   | type and properties of this chunk |
   | length   |  4     |   3   | length  of the following data     |
   | content  |  7     |   *)  | net data or a list of of chunks   |
   +----------+--------+-------+-----------------------------------+
   (* as stated in "length". total length of  chunk  is  length+6
   The chunk ID is a non-zero positive number.

   or more visually:

   +----+----+----+----+----+----+----+----+----+-...
   | chunkID | fl | length       |  content
   +----+----+----+----+----+----+----+----+----+-...

   or in ASN.1:

   chunk  ::=  SEQUENCE
   {
     chunkID INTEGER (1..65535),
     flags   BIT STRING,
     length  OCTET STRING SIZE 3, -- or: INTEGER (0..16777215)
     content OCTET STRING
   }


   2.3 Structured chunk.

   A structured chunk is marked as such by the flag byte (see 2.6).
   Opposed to an elementary chunk his content consists of a list of
   chunks (elementary or structured):

   visually in a shorter form:

   +----+-+---+-------+-------+-------+-----+-------+
   | id |f|len| chunk | chunk | chunk | ... | chunk |
   +----+-+---+-------+-------+-------+-----+-------+

   With the help of this concept you can reproduce every
   hierarchically structured data into a SDXF chunk.


   2.4 Some Remarks about the internal representation of the chunk's
   elements:

   Binary values are always in high-order-first (big endian) format,
   like the binary values in the IP header (network format). A
   length of 300 is stored as

   +----+----+----+----+----+----+----+----+----+--
   |         |    | 00   01   2C |  content
   +----+----+----+----+----+----+----+----+----+--
   in hexadecimal notation.

   This is also applicable to the chunk-ID.

   2.5 Character values in the content portion are also an object
   of adaptation: see chapter 4.

   2.6 Meaning of the flag-bits:  Let us represent the flag  byte
   in this manner:

   +-+-+-+-+-+-+-+-+
   |7|6|5|4|3|2|1|0|
   +-+-+-+-+-+-+-+-+
    | | | | | | | |
    | | | | | | | +- bit 2**0 -- check bit; should always be 1
    | | | | | | +--- bit 2**1 -- structured chunk
    | | | | | +----- bit 2**2 -- short chunk
    | | | | +------- bit 2**3 -- character chunk
    | | | +--------- bit 2**4 -- compressed chunk
    | | +----------- bit 2**5 -- encrypted chunk
    | +------------- bit 2**6 -- numeric chunk
    +--------------- bit 2**7 -- reserved

   Not all  combinations  of  bits  are  allowed  or  reasonable,
   especially  the  bits  "structured", "character" and "numeric"
   are mutually exclusive.

   2.7 A short chunk has no data body. The 3 byte Length field is
   used  as  data  bytes  instead.  This is used in order to save
   space when there are many small chunks.

   2.8 Compressed and encrypted chunks are explained in chapter 5
   and 6.


3. Introduction to the SDXF functions 3.1 General remarks

   The functionality of the SDXF concept is not  bounded  to  any
   programming language, but of corse the functions themself must
   be coded in a particular language. I realized them in  C  with
   an additional class definition as a wrapper in C++.

   All these functions for reading and writing SDXF  chunks  uses
   only  one  parameter,  it's  a parameter structure.  As member
   functions of the C++ class this parameter structure is part of
   the class.


   3.2 Writing a SDXF buffer

   To write SDXF chunks, there are following functions:

   init    -- initialize the parameter structure
   create  -- create a new chunk
   leave   -- "close" a structured chunk

   3.3 Reading a SDXF buffer

   To read SDXF chunks, there are following functions:

   init    -- initialize the parameter structure
   enter   -- "go into" a structured chunk
   next    -- "go to" the next chunk inside a structured chunk
   extract -- extract the content of an elementary chunk into
              your data area
   leave   -- "go out" off a structured chunk

   3.4 Example:

   3.4.1 Writing: (For demonstration we use a reduced  (outlined)
   C++ Form of these functions with polymorph definitions:

   void create (short chunkID); // opens a new structure,
   void create (short chunkID, char *string); // creates a new chunk
   with dataType character, etc.)


   The sequence:

   init (new);
   create (3301);
   create (3302, "first chunk");
   create (3303, "second chunk");
   create (3304);
   create (3305, "chunk in a structure");
   create (3306, "next chunk in a structure");
   leave ();
   create (3307, "third chunk");
   leave ();

   creates a chunk which we can show graphically like:

   3301
     |
     +--- 3302 = "first chunk"
     |
     +--- 3303 = "second chunk"
     |
     +--- 3304
     |      |
     |      +--- 3305 = "chunk in a structure"
     |      |
     |      +--- 3306 = "next chunk in a structure"
     |
     +--- 3307 = "last chunk"


   3.4.2 Reading

   A typically access to a structured SDXF chunk is  a  selection
   inside a loop:

   init (old);
   enter ();

   while (rc == 0) // == ok, rc will set by the SDXF functions
   {
     switch (chunkID)
     {
       case 3302:
         extract (data1, maxLength); // extr. 1st chunk into data1
         break;

       case 3303:
         extract (data2, maxLength); // extr. 2nd chunk into data2
         break;

       case 3304:  // we know this is a structure
         enter ();

         while (rc == 0) // inner loop
         {
           switch (chunkID)
           {
             case 3305:
               extract (data3, maxLength); // extr. the chunk inside struct.
               break;
             case 3306:
               extract (data4, maxLength); // extr. 2nd chunk inside struct.
               break;
           }

           next (); // returns rc = 1 at end of structure
         } // end-while
         break;

       case 3307:
         extract (data5, maxLength); // extract last chunk into data
         break;

       default:
         // ignore unknown chunks !!!

     } // end-switch

     next (); // returns rc = 1 at end of structure
   } // end-while


4. Platform independence

   The very most of the computer platforms today have  a  8-Bits-
   in-a-  Byte  architecture, which enables data exchange between
   these platforms. But there are two significant points in which
   platforms may be different:

   a) The representation of binary numerical (the short and  long
   int).

   b) The representation of characters (ASCII/ANSII vs. EBCDIC)

   Point (a) is the phenomenon of "byte swapping": How is a short
   int value 259 = 0x0103 = X'0103' be stored on address 4402?

   the two flavours are:

   4402 4403
   01   03    the big-endian, and
   03   01    the little-endian.

   Point (b) is represented by a table of the assignment  of  the
   256  possible  values  of  a  Byte  to  printable  or  control
   characters.  (in ASCII the letter "A" is assigned to value (or
   position) 0x41 = 65, in EBCDIC it is 0xC1 = 193)

   The solution of the problems which results out  of  it  is  to
   normalize the data:

   We fix:

   (a) The internal representation  of  binary  numerals  are  2-
   complements in big-endian order.

   (b) The internal representation of characters is ISO 8859-1.

   The fixing of point (b) should be regarded as a first  strike.
   In some environment 8859-1 seems not to be the best choice, in
   a  greek  or  russian  environment  8859-5   or   8859-7   are
   approbiate.  Nervertheless,  in a specific group (or world) of
   applications, that is to say all the applications which  wants
   to interchange data with a defined protocol (via networking or
   diskette or someting else), this internal character table must
   be unique.

   So a possibility  to  define  a  translation  table  (and  his
   inversion) should be given.

   Important: You construct a  SDXF  chunk  not  for  a  specific
   addressee,  but  you  adapt your data into a normalized format
   (or network format).

   This adaption is not done by the programmer, it will  be  done
   by  the create and extract function. An administrator has take
   care of defining the correct translation tables.


5. Compression

   As stated in 2.6 there is a flag bit which declares  that  the
   following data (elementary or structured) are compressed. This
   data is not further interpretable until  it  is  decompressed.
   Compression  is  transparently  done  by  the  SDXF functions:
   "create" does the compression for elementary  chunks,  "leave"
   for  structured  chunks,  "extract" does the decompression for
   elementary chunks, "enter" for structured chunks.

   Transparently means that the programmer has only to  tell  the
   SDXF functions that he want compress the following chunk(s).

   For choosing between different  compression  methods  and  for
   controlling  the  decompressed  (orginal)  length, there is an
   additional definition:

   After the chunk header for a compressed chunk,  a  compression
   header is following:

   +-----------------------------+-------------------+--------------->
   |      chunk header           | compression header| compressed data
   +----+----+----+----+----+----+----+----+----+----+--------------->
   | chunkID |flag|   length     | md |   length     |
   +----+----+----+----+----+----+----+----+----+----+--------------->

   -- 'md' is the "compression  method":  we  have  reserved  the
   methods

    01 for a simple (fast but not very effective) "Run Length  1"
   or  "Byte Run 1" algorithm. (More then 2 consecutive identical
   characters are replaced by the number of these characters  and
   the character itself.)

    02 for the wonderful "deflate" algorithm which comes from the
   "zip"-people.  The  data  portion contains the "deflated" data
   without the CRC-part.

   -- 'length' is the original (decompressed) length of the data.


6. Encryption

   As stated in 2.6 there is a flag bit which declares  that  the
   following  data  (elementary or structured) is encrypted. This
   data is not interpretable until it is decrypted. En/Decryption
   is transparently done by the SDXF functions, "create" does the
   encryption  for  elementary  chunks,  "leave"  for  structured
   chunks,  "extract"  does the decryption for elementary chunks,
   "enter" for structured chunks. (Yes it sounds very similar  to
   chapter 5.)

   More  then  one  encryption  method  for  a  given  range   of
   applications  is  not  very  reasonable.  We  specify that the
   length of the data will not be changed  by  encryption.  So  a
   special  encryption header (similar as the compression header)
   is not necessary.

   Even the en/decryption is done  transparently,  an  encryption
   key (password) must be given to the SDXF functions.

   Encryption is done  after  translating  character  data  into,
   decryption  is  done  before  translation  from  the  internal
   ("network-") format.

   If both, encryption and compression are applied  on  the  same
   chunk,  compression  is  done  first  -  compression  on  good
   encrypted  data  (same  strings  appears  as  different  after
   encryption) tends to zero compression rates.


7. Description of the SDXF functions

   Following the principles of Object Oriented  Programming,  not
   only  the  description  of the data is necessary, but also the
   functions which manipulate data - the "methods".

   For the programmer knowing the methods is more important  than
   knowing  the data structure, the methods has to know the exact
   specifications of the data and guarantees the  consistence  of
   the data while creating them.

   A SDXF object is an instance of a  parameter  structure  which
   acts  as  a  programming interface. Especially it points to an
   actual SDXF data chunk, and, while processing  on  this  data,
   there is a pointer to the actual inner chunk which will be the
   focus for the next operation.

   This is the specification of the SDXF class in C++:  (Byte  is
   defined  as "unsigned char" for bitstrings, opposed to "signed
   char" for character strings)

   class C_SDXF
   {
     public:

    // constructors and destructor:
     C_SDXF  ();                          // dummy
     C_SDXF  (Byte *cont);                // old container
     C_SDXF  (Byte *cont, long size);     // new container
     C_SDXF  (long size);                 // new container
     ~C_SDXF ();

    // methods:
     void init  (void);                   // old container
     void init  (Byte *cont);             // old container
     void init  (Byte *cont, long size);  // new container
     void init  (long size);              // new container

     void enter (void);
     void leave (void);
     void next  (void);

     long extract (Byte *data, long length); // for chars and bitstr.
     long extract (void);                    // for numeric data

     void create (ChunkID);                          // structured
     void create (ChunkID, long value);              // numeric
     void create (ChunkID, Byte *data, long length); // binary
     void create (ChunkID, char *data);              // character

     void set_compression (Byte compression_method);
     void set_encryption  (Byte *encryption_key);

    // interface:

     ChunkID  id;        // 1)
     short    dataType;  // 2)
     long     length;    // length of data or chunk

     short    rc;  // the raw return code       3)
     short    ec;  // the extended return code  4)

     protected:
    // implementation dependent

   };

   Definitions:

   1) defined as:

   typedef short chunkID;

   2) One of the values:

   SDX_DT_char             = 1
   SDX_DT_binary           = 2
   SDX_DT_numeric          = 3
   SDX_DT_structured       = 4

   3) One of the values:

   SDX_RC_ok               = 0
   SDX_RC_failed           = 1
   SDX_RC_warning          = 1
   SDX_RC_illegalOperation = 2
   SDX_RC_dataError        = 3
   SDX_RC_parameterError   = 4
   SDX_RC_programError     = 5
   SDX_RC_noMemory         = 6

   4) One of the values:

   SDX_EC_ok              =  0
   SDX_EC_eoi             =  1
   SDX_EC_notFound        =  2
   SDX_EC_dataCutted      =  3
   SDX_EC_overflow        =  4
   SDX_EC_wrongInitType   =  5
   SDX_EC_comprerr        =  6
   SDX_EC_forbidden       =  7
   SDX_EC_unknown         =  8
   SDX_EC_levelOvflw      =  9
   SDX_EC_paramMissing    = 10
   SDX_EC_magicError      = 11
   SDX_EC_not_consistent  = 12
   SDX_EC_wrongDataType   = 13
   SDX_EC_noMemory        = 14
   SDX_EC_error           = 99

   Besides  this  definition   there   is   a   global   function
   (SDX_getOptions)  which returns a pointer to a global table of
   options.

   With the help of these options you can adapt the behaviour  of
   SDXF.   Especially  you  can  define  an  alternative  pair of
   translation tables or  an  alternative  function  which  reads
   these tables from an external ressource (p.e. from disk)

   In this table of options  there  is  also  a  pointer  to  the
   function  which  is  used for encryption / decryption: You can
   install your own encryption algorithm by setting this pointer.


8. Security Considerations

   Any corruption of data  in  the  chunk  headers  denounce  the
   complete SDXF structure.

   Any corruption of data  in  a  encrypted  or  compressed  SDXF
   structure  makes this chunk unusable. An integrity check after
   decryption or decompression is done by the "enter" function.

   While using TCP/IP (more  precisely:  IP)  as  a  transmission
   medium we can trust on his CRC check on the transport layer.


9. Final remarks

   9.1 A consistent construction of a SDXF structure is  done  if
   every  "create"  to  a  structured chunk is closed by a paired
   "leave".

   9.2 While creating an elementary chunk a platform  independent
   copy of the data is performed - at the end of construction the
   content of the buffer is ready to transport to  another  site,
   without any further translation.

   9.3 As you see no data definition in your programming language
   is needed for to construct a specific SDXF structure. The data
   is created dynamically by function calls.

   9.4 With SDXF as a base you can define protocols for client  /
   server  applications. With following two rules these protocols
   may be extended in downward compatibility manner:

   Rule 1: Ignore unknown chunkIDs.

   Rule 2: The sequence of chunks should not be significant.

   10. Author's Address

   Max Wildgrube
   Schlossstrasse 120
   60486 Frankfurt
   Germany

   EMail: max@wildgrube.com