Network Working Group                  Richard Price, Siemens/Roke Manor 
INTERNET-DRAFT                       Abigail Surtees, Siemens/Roke Manor 
Expires: December 2002                   Mark A West, Siemens/Roke Manor 
 
                                                           June 24, 2002 
 
 
                            SigComp User Guide 
               <draft-price-rohc-sigcomp-user-guide-00.txt> 
                                     
    
Status of this memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or cite them other than as "work in progress". 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/lid-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
   This document is a submission of the IETF ROHC WG. Comments should be 
   directed to its mailing list, rohc@ietf.org. 
    
    
Abstract 
    
   This document provides an informational guide for users of the 
   SigComp protocol. The aim of the document is to assist users when 
   making SigComp implementation decisions; for example the choice of 
   compression algorithm and the level of robustness against lost or 
   misordered packets. 
    
    
Price et al.                                                    [Page 1]  

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
Table of contents 
   
   1.  Introduction..................................................2 
   2.  Terminology...................................................2 
   3.  Overview of the User Guide....................................2 
   4.  Mnemonic language.............................................4 
   5.  Compression algorithms........................................10 
   6.  Additional SigComp mechanisms.................................28 
   7.  Security considerations.......................................35 
   8.  Acknowledgements..............................................35 
   9.  Authors' addresses............................................35 
   10. Intellectual Property Right Considerations....................35 
   11. References....................................................36 
    
 
1.  Introduction 
    
   This document provides an informational guide for users of the 
   SigComp protocol [SIGCOMP]. The idea behind SigComp is to standardize 
   a Universal Decompressor Virtual Machine (UDVM) that can be 
   programmed to understand the output of many well-known compressors 
   including DEFLATE [DEFLATE] and LZW [LZW]. The bytecode for the 
   choice of compression algorithm is uploaded to the UDVM as part of 
   the compressed data. 
    
   The basic SigComp standard describes the actions that an endpoint 
   must take upon receiving a SigComp message. However the entity 
   responsible for generating new SigComp messages (the SigComp 
   compressor) is left as an implementation decision; any compressor can 
   be used provided that it generates SigComp messages that can be 
   successfully decompressed by the receiving endpoint. 
    
   This document offers a number of different compressors that can be 
   used by the SigComp protocol. It also describes how standard stream-
   based compressors can be modified for robustness against lost and/or 
   misordered packets over an unreliable transport such as UDP. 
    
    
2.  Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [RFC-2119]. 
    
    
3.  Overview of the User Guide 
    
   When implementing a SigComp compressor the first step is to choose a 
   compression algorithm that can encode the application messages into a 
   (hopefully) smaller form. Since SigComp can upload bytecode for new 
   algorithms to the receiving endpoint, arbitrary compression 

 
Price et al.                                                    [Page 2] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   algorithms can be supported provided that bytecode is available for 
   the corresponding decompression algorithm. 
    
   This document provides bytecode for the following algorithms: 
    
   1. Simplified LZ77 
    
   2. LZSS 
    
   3. LZW 
    
   4. DEFLATE 
    
   5. LZJH 
    
   6. EPIC 
    
   Any of the above algorithms may be useful depending on the desired 
   compression ratio, processing and memory requirements, code size, 
   implementation complexity and Intellectual Property (IPR) 
   considerations. 
    
   As well as encoding the application messages using the chosen 
   algorithm, the SigComp compressor is responsible for ensuring that 
   messages can be correctly decompressed even if packets are lost or 
   misordered during transmission. The SigComp feedback mechanism can be 
   used to acknowledge successful decompression over an unreliable 
   transport such as UDP. 
    
   The following robustness techniques and other mechanisms specific to 
   the SigComp environment are covered in this document: 
    
   1. Acknowledgements using the SigComp feedback mechanism 
    
   2. Static dictionary 
    
   3. CRC checksum 
    
   4. Announcing additional resources 
    
   5. Shared compression 
    
   Any or all of the above mechanisms can be implemented in conjunction 
   with each compression algorithm. A subroutine of UDVM bytecode is 
   provided for each of the mechanisms; these subroutines can be added 
   to the bytecode for any of the basic compression algorithms. 
    
    
Price et al.                                                    [Page 3] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
4.  Mnemonic language 
    
   Writing UDVM programs directly in bytecode would be a daunting task, 
   so a simple mnemonic language is provided to facilitate the creation 
   of new decompression algorithms. The mnemonic language allows the 
   UDVM instructions and their operands to be specified as text names as 
   well as integer values. 
    
   A string of mnemonic code consists of one or more primitives 
   separated by whitespace. Each primitive can be one instance of a 
   comment, instruction, label or padding, as illustrated by the 
   following ABNF description [RFC-2234]: 
    
   mnemonic_code    =         primitive *(ws primitive) 
    
   primitive        =         comment | instruction | label | padding 
    
   ws               =         1*(%x09 | %x0A | %x0D | %x20) 
    
   A parser for the mnemonic language scans for each primitive in turn 
   and converts it into the corresponding UDVM bytecode. The resulting 
   string of bytes can then be uploaded to the UDVM as explained in 
   Section 4.4. 
    
   Comments are specified by the symbol ";" and are terminated by the 
   end of the line: 
    
   comment          =         ";" *(%x00-09 | %x0B-0C | %x0E-FF) 
    
   For example: 
    
   ; This is a comment. 
    
   Comments are deleted when converting the mnemonic code into bytecode. 
    
   The following sections define how to convert the remaining types of 
   primitive into bytecode. 
    
4.1.  Instructions 
    
   A UDVM instruction is specified by the instruction opcode followed by 
   zero or more operands: 
    
   instruction      =         opcode [ws "(" operand_list ")"] 
    
   operand_list     =         operand *("," ws operand) 
    
   The instruction operands are enclosed in parentheses and separated by 
   commas, for example: 
    
   ADD (3, 4) 

 
Price et al.                                                    [Page 4] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   In the mnemonic language instruction opcodes are expressed in 
   uppercase text: when generating the bytecode the parser should 
   replace this text with the corresponding 1-byte value as per Figure 
   11 of SigComp. 
    
   opcode           =         uppercase *(uppercase | digit | "-") 
    
   uppercase        =         %x41-5A 
    
   digit            =         %x30-39 
    
   Instruction operands are expressed either as integer values or as 
   text names. In the latter case the text name is replaced by its 
   corresponding integer value (assigned as per Section 4.2). 
    
   An operand may also be preceded by the symbol "$", which indicates 
   that the supplied integer value must be interpreted as the memory 
   address at which the operand value can be found rather than the 
   actual operand value itself. 
    
   operand          =         ["$"] value 
    
   value            =         integer | text_name 
    
   integer          =         decimal | binary | hex 
    
   decimal          =         1*(digit) 
    
   binary           =         "0b" 1*("0" | "1") 
    
   hex              =         "0h" 1*(hex_digit) 
    
   hex_digit        =         digit | %x41-46 | %x61-66 
    
   text_name        =         1*(lowercase | "_") 
    
   lowercase        =         %x61-7A 
    
   When converting each instruction operand to bytecode, the parser 
   first determines whether the instruction expects the operand to be a 
   literal, a reference, a multitype or an address. If the operand is a 
   literal then as per Figure 8 of SigComp, the parser inserts the 
   shortest bytecode capable of encoding the supplied operand value. 
    
   If the operand is a reference then as per Figure 9 of SigComp, the 
   parser inserts the shortest bytecode capable of encoding the supplied 
   memory address. Note that reference operands will always be preceded 
   by the symbol "$" in the mnemonic code because they always encode 
   memory addresses rather than actual operand values. 
    

Price et al.                                                    [Page 5] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   If the operand is a multitype then the parser first checks whether 
   the symbol "$" is present. If so then as per Figure 10 of SigComp, it 
   inserts the shortest bytecode capable of encoding the supplied 
   integer as a memory address. If not then it inserts the inserts the 
   shortest bytecode capable of encoding the supplied integer as an 
   operand value. 
    
   If the operand is an address then the parser checks whether the 
   symbol "$" is present. If so then the supplied integer is encoded as 
   a memory address, just as for the multitype instruction above. If not 
   then the byte position of the opcode is subtracted from the supplied 
   integer modulo 16, and the result is encoded as an operand value as 
   per Figure 10 of SigComp. 
    
4.2.  Labels 
    
   As described in Section 4.1 the mnemonic language allows instruction 
   operands to be specified as text names as well as integer values. 
    
   Each text name should be assigned a corresponding integer value by 
   means of a label. The mnemonic language defines a label to be a 
   single colon followed by the text name to be defined and optionally 
   an expression: 
    
   label            =         ":" text_name [ws "=" ws expression] 
    
   If an expression is present then it is evaluated and the resulting 
   integer is assigned to the text string. 
    
   Acceptable expressions include a single value, or two values 
   separated by one of the four basic arithmetic operators. Moreover 
   each value can be an integer or an instance of a text name to which a 
   value has previously been assigned: 
    
   expression       =         value | (value ws operator ws value) 
    
   operator         =         "+" | "-" | "*" | "/" 
    
   If no expression is present then the (absolute) position of the byte 
   immediately following the label is evaluated and assigned to the text 
   string. 
    
4.3.  Padding 
    
   The mnemonic language also provides the ability to pad bytecode, 
   which is useful for reserving areas of the UDVM memory for variables 
   and other data. 
    
   Three types of padding are provided by the mnemonic language: 
    
   padding          =         padding_bytes | alignment | byte_string 

 
Price et al.                                                    [Page 6] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   The statement ".pad n" appends n consecutive padding bytes to the 
   bytecode. The actual value of the padding bytes is unimportant; when 
   the bytecode is uploaded to the UDVM the padding bytes will be set to 
   the initial values contained in the UDVM memory. 
    
   padding_bytes    =         ".pad" ws integer 
    
   The statement ".align n" appends sufficient padding bytes to the 
   bytecode such that the total bytecode generated so far is a multiple 
   of n bytes. 
    
   alignment        =         ".align" ws integer 
    
   The statement ".byte" appends a specified byte string to the 
   bytecode. The byte string is supplied as integers from 0 to 255, 
   separated by whitespace. The string of integers is terminated by the 
   end of a line. 
    
   byte_string      =         ".byte" 1*((%x09 | %x20) integer) 
    
4.4.  Uploading the bytecode to the UDVM 
    
   Once the parser has converted a string of mnemonic code into the 
   corresponding bytecode, it must be copied to the UDVM memory 
   beginning at Address 0 and then executed beginning from the first 
   instance of a UDVM instruction. 
    
   SigComp provides the following message format for uploading bytecode 
   to the UDVM: 
    
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   | 1   1   1   1   1 | T |   0   | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :    returned feedback item     :  if T = 1 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |           code_len            | 
   +---+---+---+---+---+---+---+---+ 
   |   code_len    |  destination  | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :    uploaded UDVM bytecode     : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   remaining SigComp message   : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    

Price et al.                                                    [Page 7] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   The destination field should be set to the memory address of the 
   first UDVM instruction. Note that if this address cannot be 
   represented by the destination field then the bytecode cannot be 
   uploaded to the UDVM using the standard SigComp header. In 
   particular, the memory address of the first UDVM instruction must 
   always be a multiple of 64 bytes or the standard SigComp header 
   cannot be used. Note however that there may be other ways to upload 
   the bytecode to the UDVM, such as retrieving the bytecode directly 
   via the INPUT-BYTES instruction. 
    
   Additionally, all memory addresses between Address 0 and Address 31 
   inclusive are initialized to undefined values by the UDVM, so they 
   must be specified as padding in the bytecode or the standard SigComp 
   header cannot be used. Memory addresses from Address 32 to Address 
   (destination - 1) inclusive are initialized to 0, so they must be 
   specified either as padding or as 0s if the bytecode is to be 
   successfully uploaded using the standard SigComp header. 
    
   The code_len field should be set to the smallest value such that all 
   memory addresses beginning at Address (destination + code_len) are 
   either padding, set to 0 by the bytecode, or are beyond the total 
   length of the bytecode. 
    
   The "uploaded UDVM bytecode" should be set to contain the segment of 
   bytecode that lies between Address (destination) and Address 
   (destination + code_len - 1) inclusive. 
    
4.5.  ABNF description of mnemonic language 
    
   The following is a complete ABNF [RFC-2234] description of the 
   mnemonic language: 
    
   mnemonic_code    =         primitive *(ws primitive) 
    
   primitive        =         comment | instruction | label | padding 
    
   ws               =         1*(%x09 | %x0A | %x0D | %x20) 
    
   comment          =         ";" *(%x00-09 | %x0B-0C | %x0E-FF) 
    
                              ; Comments are indicated by a semicolon  
                              ; and continue to the end of the line. 
    
   instruction      =         opcode *(ws operand) 
    
   opcode           =         uppercase *(uppercase | digit | "-") 
    
                              ; The list of instruction opcodes and  
                              ; their corresponding 1-byte values is  
                              ; given in Figure 11 of SigComp. 
    

Price et al.                                                    [Page 8] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   operand          =         ["$"] value 
    
                              ; When a "$" symbol is appended to the  
                              ; front of an instruction operand then the  
                              ; corresponding integer must be encoded as  
                              ; a memory address rather than as the  
                              ; actual operand value. This symbol is  
                              ; mandatory for reference operands,  
                              ; optional for multitypes and addresses,  
                              ; and disallowed for literals. 
    
   value            =         integer | text_name 
    
   integer          =         decimal | binary | hex 
    
                              ; Instruction operands can be given in the  
                              ; form of integers. They are converted  
                              ; into the shortest bytecode capable of  
                              ; representing the integer by the rules of  
                              ; Section 8.5 of SigComp. 
    
   decimal          =         1*(digit) 
    
   binary           =         "0b" 1*("0" | "1") 
    
   hex              =         "0h" 1*(hex_digit) 
    
   hex_digit        =         digit | %x41-46 | %x61-66 
    
   digit            =         %x30-39 
    
   text_name        =         1*(lowercase | "_") 
    
                              ; Instruction parameters can also be given  
                              ; in the form of lowercase names. These  
                              ; names are assigned an integer value by 
                              ; means of a label. 
    
   uppercase        =         %x41-5A 
    
   lowercase        =         %x61-7A 
    
   label            =         ":" text_name [ws "=" ws expression] 
    
                              ; Label names are given as a colon  
                              ; followed by lowercase text. They are  
                              ; deleted when converting the mnemonic  
                              ; code to bytecode. 
    
   expression       =         value | (value ws operator ws value) 
    

Price et al.                                                    [Page 9] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   operator         =         "+" | "-" | "*" | "/" 
    
   padding          =         padding_bytes | alignment | byte_string 
    
   padding_bytes    =         ".pad" ws integer 
    
   alignment        =         ".align" ws integer 
    
   byte_string      =         ".byte" 1*((%x09 | %x20) integer) 
    
    
5.  Compression algorithms 
    
   This chapter describes a number of compression algorithms that can be 
   used by a SigComp compressor. In each case the document provides UDVM 
   bytecode for the corresponding decompression algorithm, which can be 
   uploaded to the receiving endpoint as part of the compressed data. 
    
   Section 5.1 covers a simple algorithm in some detail, including the 
   steps required to compress and decompress a SigComp message. The 
   remaining sections cover well-known compression algorithms that can 
   be adapted for use in SigComp with minimal modification. 
    
5.1.  Simplified LZ77 
    
   This section describes how to implement a very simple compression 
   algorithm based on LZ77 [LZ77]. 
    
   A compressed message generated by the simplified LZ77 scheme consists 
   of a sequence of 4-byte characters, where each character contains a 
   2-byte position value followed by a 2-byte length value. Each pair of 
   integers identifies a byte string in the UDVM memory; when 
   concatenated these byte strings form the decompressed message. 
    
   When implementing a bytecode decompressor for the simplified LZ77 
   scheme, the UDVM memory is partitioned into five distinct areas as 
   shown below: 
    
    
   0             64          128        256          512 
    
   | scratch-pad | variables | bytecode | dictionary | circular buffer | 
   +-------------+-----------+----------+------------+-----------------+ 
    
    <-----------> <---------> <--------> <----------> <---------------> 
       64 bytes     64 bytes   128 bytes   256 bytes      512+ bytes 
    
   The first 128 bytes are used to hold the 2-byte variables needed by 
   the LZ77 decompressor. Within this memory the first 64 bytes is used 
   as a scratch-pad, holding the 2-byte variables that can be discarded 
   between SigComp messages. In contrast the next 64 bytes (and in fact 

 
Price et al.                                                   [Page 10] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   all of the UDVM memory starting from Address 64) should be saved 
   after decompressing a SigComp message to improve the compression 
   ratio of subsequent messages. 
    
   The bytecode for the LZ77 decompressor is stored beginning at Address 
   128. A total of 128 bytes are reserved for the bytecode although the 
   LZ77 decompressor requires less; this allows room for adding 
   additional features to the decompressor at a later stage. 
    
   The next 256 bytes are initialized by the bytecode to contain the 
   integers 0 to 255 inclusive. The purpose of this memory area is to 
   provide a dictionary of all possible uncompressed characters; this is 
   important to ensure that the compressor can always generate a 
   sequence of position/length pairs that encode a given message. For 
   example, a byte with value 0x41 (corresponding to the ASCII character 
   "A") can be found at Address 0x0141 of the UDVM memory, so the 
   compressed character 0x01410001 will decompress to give this ASCII 
   character. Note that encoding each byte in the application message as 
   a separate 4-byte compressed character is not recommended however, as 
   the resulting "compressed" message is four times as large as the 
   original uncompressed message. 
    
   The compression ratio of LZ77 is improved by the remaining UDVM 
   memory, which is used to store a history buffer containing the 
   previously decompressed messages. Compressed characters can point to 
   strings that have previously been decompressed and stored in the 
   buffer; so the overall compression ratio of the LZ77 algorithm 
   improves as the decompressor "learns" more text strings and is able 
   to encode longer strings using a single compressed character. The 
   buffer is circular, so older messages are overwritten by new data 
   when the buffer becomes full. 
    
   Note that the actual size of this circular buffer depends on the 
   total amount of memory available to the UDVM. The minimum size of the 
   UDVm memory is 1K, so the circular buffer will always contain at 
   least 512 bytes. 
    
   The steps required to implement an LZ77 compressor and decompressor 
   are similar, although compression is more processor-intensive as it 
   requires a searching operation to be performed. The mnemonic code for 
   the simplified LZ77 decompressor is given below: 
    
   ; Variables that do not need to be stored after decompressing each  
   ; SigComp message are stored here: 
    
   .pad 32 
    
   :index                          .pad 2 
   :length_value                   .pad 2 
    
   .align 42 

 
Price et al.                                                   [Page 11] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   :requested_feedback_location = 0 
    
   ; The UDVM registers must be stored beginning at Address 64: 
    
   .align 64 
    
   ; Variables that should be stored after decompressing a message are 
   ; stored here. These variables will form part of the SigComp state  
   ; item created by the bytecode: 
    
   :byte_copy_left                 .pad 2 
   :byte_copy_right                .pad 2 
   :decompressed_pointer           .pad 2 
    
   :returned_parameters_location = 0 
    
   .align 64 
    
   :initialize_memory 
    
   :udvm_memory_size = 8192 
   :state_length = udvm_memory_size - 64 
    
   ; The UDVM registers byte_copy_left and byte_copy_right are set to 
   ; indicate the bounds of the circular buffer in the UDVM memory. A  
   ; variable decompressed_pointer is also created and set pointing to 
   ; the start of the circular buffer: 
    
   MULTILOAD (64, 3, circular_buffer, udvm_memory_size, circular_buffer) 
    
   ; The "dictionary" area of the UDVM memory is initialized to contain 
   ; the values 0 to 255 inclusive: 
    
   MEMSET (static_dictionary, 256, 0, 1) 
    
   :decompress_sigcomp_message 
    
   :next_character 
    
   ; The next character in the compressed message is read by the UDVM  
   ; and the position and length integers are stored in the variables  
   ; position_value and length_value respectively. If no more  
   ; compressed data is available the decompressor jumps to the 
   ; "end_of_message" subroutine: 
    
   INPUT-BYTES (4, index, end_of_message) 
    
   ; The position_value and length_value point to a byte string in the 
   ; UDVM memory, which is copied into the circular buffer at the 
   ; position specified by decompressed_pointer. This allows the string  

 
Price et al.                                                   [Page 12] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   ; to be referenced by later characters in the compressed message: 
    
   COPY-LITERAL ($index, $length_value, $decompressed_pointer) 
    
   ; The byte string is also outputted onto the end of the decompressed  
   ; message: 
    
   OUTPUT ($index, $length_value) 
    
   ; The decompressor jumps back to consider the next character in the 
   ; compressed message: 
    
   JUMP (next_character) 
    
   :end_of_message 
    
   ; The decompressor saves the UDVM memory and halts: 
    
   END-MESSAGE (requested_feedback_location, 
   returned_parameters_location, state_length, 64, 
   decompress_sigcomp_message, 6, 0) 
    
   :fail 
    
   DECOMPRESSION-FAILURE 
    
   .align 256 
    
   ; Memory for the dictionary and the circular buffer are reserved by  
   ; the following statements: 
    
   :static_dictionary              .pad 256 
   :circular_buffer 
    
   The task of an LZ77 compressor is simply to discover a sequence of 4-
   byte compressed characters which the above bytecode will decompress 
   to give the desired application message. As an example, a message 
   compressed using the simplified LZ77 algorithm is given below: 
    
   0x01 54 00 01 01 68 00 01 01 65 00 01 01 20 00 01 01 52 00 01  
   0x01 65 00 01 01 73 00 02 01 61 00 01 01 75 00 01 01 72 00 01  
   0x01 61 00 01 01 6e 00 01 01 74 00 01 01 20 00 01 01 61 00 01  
   0x02 0d 00 02 01 74 00 01 02 01 00 03 01 45 00 01 01 6e 00 01  
   0x01 64 00 01 01 20 00 01 01 6f 00 01 01 66 00 01 02 11 00 05  
   0x01 55 00 01 01 6e 00 01 01 69 00 01 01 76 00 01 01 65 00 01  
   0x01 72 00 02 01 65 00 01 01 0a 00 01 
    
   The bytecode for the LZ77 decompressor can be uploaded as part of the 
   compressed message as specified in Section 4.4. However, in order to 
   improve the overall compression ratio it is important to avoid 
   uploading bytecode in every compressed message. For this reason 

 
Price et al.                                                   [Page 13] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   SigComp allows the UDVM to save an area of its memory as a state item 
   between compressed messages. Once a state item has been created it 
   can be retrieved by sending the corresponding state identifier using 
   the following SigComp message format: 
    
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   | 1   1   1   1   1 | T |   1   | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :    returned feedback item     :  if T = 1 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   partial state identifier    : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   remaining SigComp message   : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    
   The partial_state_identifier field must contain the first 6 bytes of 
   the state identifier for the state item to be accessed (see SigComp 
   for details of how state identifiers are derived). 
    
5.2.  LZSS 
    
   This section provides UDVM bytecode for the simple but effective LZSS 
   compression algorithm [LZSS]. 
    
   The principal improvement offered by LZSS over LZ77 is that each 
   compressed character begins with a 1-bit indicator flag to specify 
   whether the character is a literal or an offset/length pair. A 
   literal value is simply a single uncompressed byte that is appended 
   directly to the decompressed message. 
    
   An offset/length pair contains a 12-bit offset value from 1 to 4096 
   inclusive, followed by a 4-bit length value from 3 to 18 inclusive. 
   Taken together these values specify one of the previously received 
   text strings in the circular buffer, which is then appended to the 
   end of the decompressed message. 
    
   Mnemonic code for an LZSS decompressor is given below: 
    
   .pad 32 
    
   :index                          .pad 2 
   :length_value                   .pad 2 
   :old_pointer                    .pad 2 
    

Price et al.                                                   [Page 14] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   .align 42 
    
   :requested_feedback_location = 0 
    
   .align 64 
    
   :byte_copy_left                 .pad 2 
   :byte_copy_right                .pad 2 
   :input_bit_order                .pad 2 
   :decompressed_pointer           .pad 2 
    
   :returned_parameters_location = 0 
    
   .align 64 
    
   :initialize_memory 
    
   :udvm_memory_size = 8192 
   :state_length = udvm_memory_size - 64 
    
   MULTILOAD (64, 4, circular_buffer, udvm_memory_size, 0, 
   circular_buffer) 
    
   :decompress_sigcomp_message 
    
   :next_character 
    
   INPUT-HUFFMAN (index, end_of_message, 2, 9, 0, 255, 16384, 4, 0, 
   8191, 1) 
   COMPARE ($index, 8192, length, end_of_message, literal) 
    
   :literal 
    
   :index_lsb = index + 1 
    
   OUTPUT (index_lsb, 1) 
   COPY-LITERAL (index_lsb, 1, $decompressed_pointer) 
   JUMP (next_character) 
    
   :length 
    
   INPUT-BITS (4, length_value, fail) 
   ADD ($length_value, 3) 
   LOAD (old_pointer, $decompressed_pointer) 
   COPY-OFFSET ($index, $length_value, $decompressed_pointer) 
   OUTPUT ($old_pointer, $length_value) 
   JUMP (next_character) 
    
   :end_of_message 
    

Price et al.                                                   [Page 15] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   END-MESSAGE (requested_feedback_location, 
   returned_parameters_location, state_length, 64, 
   decompress_sigcomp_message, 6, 0) 
    
   :fail 
    
   DECOMPRESSION-FAILURE 
    
   :circular_buffer 
    
5.3.  LZW 
    
   This section provides UDVM bytecode for the well-known LZW 
   compression algorithm [LZW]. This algorithm is used in a number of 
   standards including the GIF image format. 
    
   LZW compression operates in a similar manner to LZ77 in that it 
   maintains a circular buffer of previously received decompressed data, 
   and each compressed character references exactly one byte string from 
   the circular buffer. However, LZW also maintains a "codebook" 
   containing 1024 position/length pairs that point to byte strings 
   which LZW believes are most likely to occur in the uncompressed data. 
    
   The byte strings stored in the LZW codebook can be referenced by 
   sending a single 10-bit value from 0 to 1023 inclusive. The UDVM 
   extracts the corresponding text string from the codebook and appends 
   it to the end of the decompressed message. It then creates a new 
   codebook entry containing the current text string plus the next 
   character to occur in the decompressed message. 
    
   Mnemonic code for an LZW decompressor is given below: 
    
   .pad 32 
    
   :length_value                   .pad 2 
   :position_value                 .pad 2 
   :index                          .pad 2 
    
   .align 42 
    
   :requested_feedback_location = 0 
    
   .align 64 
    
   :byte_copy_left                 .pad 2 
   :byte_copy_right                .pad 2 
   :input_bit_order                .pad 2 
    
   :codebook_next                  .pad 2 
   :current_length                 .pad 2 
   :decompressed_pointer           .pad 2 

 
Price et al.                                                   [Page 16] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   :returned_parameters_location = 0 
    
   .align 64 
    
   :initialize_memory 
    
   :udvm_memory_size = 8192 
   :state_length = udvm_memory_size - 64 
    
   MULTILOAD (64, 6, circular_buffer, udvm_memory_size, 0, codebook, 1, 
   static_dictionary) 
    
   :initialize_codebook 
    
   ; The following instructions are used to initialize the first 256  
   ; entries in the LZW codebook with single ASCII characters: 
    
   :index_lsb = index + 1 
   :current_length_lsb = current_length + 1 
    
   COPY-LITERAL (current_length_lsb, 3, $codebook_next) 
   COPY-LITERAL (index_lsb, 1, $decompressed_pointer) 
   ADD ($index, 1) 
   COMPARE ($index, 256, initialize_codebook, next_character, 0) 
    
   :decompress_sigcomp_message 
    
   :next_character 
    
   ; The following INPUT-BITS instruction extracts 10 bits from the  
   ; compressed message: 
    
   INPUT-BITS (10, index, end_of_message) 
    
   ; The following instructions interpret the received bits as an index 
   ; into the LZW codebook, and extract the corresponding  
   ; position/length pair: 
    
   :length_value_lsb = length_value + 1 
    
   MULTIPLY ($index, 3) 
   ADD ($index, codebook) 
   COPY ($index, 3, length_value_lsb) 
    
   ; The following instructions append the selected text string to the  
   ; circular buffer and create a new codebook entry pointing to this  
   ; text string: 
    
   LOAD (current_length, 1) 
   ADD ($current_length, $length_value) 
   COPY-LITERAL (current_length_lsb, 3, $codebook_next) 

 
Price et al.                                                   [Page 17] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   COPY-LITERAL ($position_value, $length_value, $decompressed_pointer) 
    
   ; The following instruction outputs the text string specified by the 
   ; position/length pair: 
    
   OUTPUT ($position_value, $length_value) 
   JUMP (next_character) 
    
   :end_of_message 
    
   END-MESSAGE (requested_feedback_location, 
   returned_parameters_location, state_length, 64, 
   decompress_sigcomp_message, 6, 0) 
    
   :static_dictionary              .pad 256 
   :circular_buffer 
    
   .align 4879 
    
   :codebook 
    
   An example message compressed using the LZW algorithm is given below: 
    
   0x14 c6 f0 80 6c 1b c6 e1 9c 20 18 46 e1 90 20 1d 06 84 20 6b 1c c2  
   0x01 98 6f 1c 90 71 b0 6c 42 c6 81 95 11 1a 47 31 a0 21 02 bf f0 
    
    
5.4.  DEFLATE 
    
   This section provides UDVM bytecode for the DEFLATE compression 
   algorithm. DEFLATE is the algorithm used in the well-known "gzip" 
   file format. 
    
   The following bytecode will decompress the DEFLATE compressed data 
   format [DEFLATE] with the following modifications: 
    
   1. The DEFLATE compressed data format separates blocks of compressed  
      data by transmitting 7 consecutive zero bits. Each SigComp message 
      is assumed to contain a separate block of compressed data, so the 
      end-of-block bits are implicit and do not need to be transmitted 
      at the end of a SigComp message. 
    
   2. The bytecode supports only DEFLATE block type 01 (data compressed  
      with fixed Huffman codes). 
    
   Mnemonic code for the DEFLATE decompressor is given below: 
    
   .pad 32 
    
   :index                          .pad 2 
   :extra_length_bits              .pad 2 

 
Price et al.                                                   [Page 18] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   :length_value                   .pad 2 
   :extra_distance_bits            .pad 2 
   :distance_value                 .pad 2 
    
   .align 42 
    
   :requested_feedback_location = 0 
    
   .align 64 
    
   :byte_copy_left                 .pad 2 
   :byte_copy_right                .pad 2 
   :input_bit_order                .pad 2 
   :decompressed_pointer           .pad 2 
    
   :length_table                   .pad 116 
   :distance_table                 .pad 120 
    
   :returned_parameters_location = 0 
    
   .align 64 
    
   :initialize_memory 
    
   :udvm_memory_size = 8192 
   :state_length = udvm_memory_size - 64 
   :table_one = length_table - 4 
   :table_two = table_one + 65536 
   :length_table_start = table_two / 4 
   :length_table_mid = length_table_start + 24 
   :distance_table_start = distance_table / 4 
    
   MULTILOAD (64, 122, circular_buffer, udvm_memory_size, 5, 
   circular_buffer, 
        
   0,       3,       0,       4,       0,       5, 
   0,       6,       0,       7,       0,       8, 
   0,       9,       0,       10,      1,       11, 
   1,       13,      1,       15,      1,       17, 
   2,       19,      2,       23,      2,       27, 
   2,       31,      3,       35,      3,       43, 
   3,       51,      3,       59,      4,       67, 
   4,       83,      4,       99,      4,       115, 
   5,       131,     5,       163,     5,       195, 
   5,       227,     0,       258, 
    
   0,       1,       0,       2,       0,       3, 
   0,       4,       1,       5,       1,       7, 
   2,       9,       2,       13,      3,       17, 
   3,       25,      4,       33,      4,       49, 
   5,       65,      5,       97,      6,       129, 

 
Price et al.                                                   [Page 19] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   6,       193,     7,       257,     7,       385, 
   8,       513,     8,       769,     9,       1025, 
   9,       1537,    10,      2049,    10,      3073, 
   11,      4097,    11,      6145,    12,      8193, 
   12,      12289,   13,      16385,   13,      24577) 
    
   :decompress_sigcomp_message 
    
   INPUT-BITS (3, extra_length_bits, fail) 
    
   :next_character 
    
   INPUT-HUFFMAN (index, end_of_message, 4,  
       7, 0, 23, length_table_start,  
       1, 48, 191, 0,  
       0, 192, 199, length_table_mid, 
       1, 400, 511, 144) 
   COMPARE ($index, length_table_start, literal, end_of_message, 
   length_distance) 
    
   :literal 
    
   :index_lsb = index + 1 
    
   OUTPUT (index_lsb, 1) 
   COPY-LITERAL (index_lsb, 1, $decompressed_pointer) 
   JUMP (next_character) 
    
   :length_distance 
    
   ; this is the length part 
    
   MULTIPLY ($index, 4) 
   COPY ($index, 4, extra_length_bits) 
   INPUT-BITS ($extra_length_bits, extra_length_bits, fail) 
   ADD ($length_value, $extra_length_bits) 
    
   ; this is the distance part 
    
   INPUT-BITS (5, index, fail) 
   ADD ($index, distance_table_start) 
   MULTIPLY ($index, 4) 
   COPY ($index, 4, extra_distance_bits) 
   INPUT-BITS ($extra_distance_bits, extra_distance_bits, fail) 
   ADD ($distance_value, $extra_distance_bits) 
   LOAD (index, $decompressed_pointer) 
   COPY-OFFSET ($distance_value, $length_value, $decompressed_pointer) 
   OUTPUT ($index, $length_value) 
   JUMP (next_character) 
    
   :end_of_message 

 
Price et al.                                                   [Page 20] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   END-MESSAGE (requested_feedback_location, 
   returned_parameters_location, state_length, 64, 
   decompress_sigcomp_message, 6, 0) 
    
   :fail 
    
   DECOMPRESSION-FAILURE 
    
   :circular_buffer 
    
   An example message compressed using the DEFLATE algorithm is given 
   below: 
    
   0xf3 c9 4c 4b d5 51 28 c9 48 55 08 cd cb 2c 4b 2d 2a 4e 55 48 cc 4b  
   0x51 70 05 32 2b 4b 32 32 f3 d2 b9 00 00 00 00 ff ff 00 
    
    
5.5.  LZJH 
    
   This section provides UDVM bytecode for the LZJH compression 
   algorithm. LZJH is the algorithm adopted by the International 
   Telecommunication Union (ITU-T) Recommendation V.44 [LZJH]. 
    
   Mnemonic code for the LZJH decompressor is given below: 
    
   .pad 32 
    
   ; The following 2-byte variables are stored in the scratch-pad memory 
   ; area because they do not need to be saved after decompressing a 
   ; SigComp message: 
    
   :length_value                   .pad 2 
   :position_value                 .pad 2 
   :index                          .pad 2 
   :extra_extension_bits           .pad 2 
   :codebook_old                   .pad 2 
    
   .align 42 
    
   :requested_feedback_location = 0 
    
   .align 64 
    
   ; UDVM_registers 
    
   :byte_copy_left                 .pad 2 
   :byte_copy_right                .pad 2 
   :input_bit_order                .pad 2 
    
   ; The following 2-byte variables are saved as state after  

 
Price et al.                                                   [Page 21] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   ; decompressing a SigComp message: 
    
   :current_length                 .pad 2 
   :decompressed_pointer           .pad 2 
   :ordinal_length                 .pad 2 
   :codeword_length                .pad 2 
   :codebook_next                  .pad 2 
    
   :returned_parameters_location = 0 
    
   .align 64 
    
   :initialize_memory 
    
   ; The following constants can be adjusted to configure the LZJH 
   ; decompressor. The current settings are as recommended in the V.44 
   ; specification (given that a total of 8K UDVM memory is available): 
    
   :udvm_memory_size = 8192   ; sets the total memory for LZJH 
   :max_extension_length = 8  ; sets the maximum string extension length 
   :min_ordinal_length = 7    ; sets the minimum ordinal length 
   :min_codeword_length = 6   ; sets the minimum codeword length 
    
   :codebook_start = 4879 
   :first_codeword = codebook_start - 12 
   :state_length = udvm_memory_size - 64 
    
   MULTILOAD (64, 8, circular_buffer, udvm_memory_size, 7, 0, 
   circular_buffer, min_ordinal_length, min_codeword_length, 
   codebook_start) 
    
   :decompress_sigcomp_message 
    
   :standard_prefix 
    
   ; The following code decompresses the standard 1-bit LZJH prefix  
   ; which specifies whether the next character is an ordinal or a 
   ; codeword/control value: 
    
   INPUT-BITS (1, index, end_of_message) 
   COMPARE ($index, 1, ordinal, codeword_control, codeword_control) 
    
   :prefix_after_codeword 
    
   ; The following code decompresses the special LZJH prefix that only 
   ; occurs after a codeword. It specifies whether the next character is 
   ; an ordinal, a codeword/control value, or a string extension: 
    
   INPUT-HUFFMAN (index, fail, 2, 1, 1, 1, 2, 1, 0, 1, 0) 
   COMPARE ($index, 1, ordinal, string_extension, codeword_control) 
    

Price et al.                                                   [Page 22] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   :ordinal 
    
   ; The following code decompresses an ordinal character, and creates 
   ; a new codebook entry consisting of the ordinal character plus the  
   ; next character to be decompressed: 
    
   :index_lsb = index + 1 
   :current_length_lsb = current_length + 1 
    
   INPUT-BITS ($ordinal_length, index, fail) 
   OUTPUT (index_lsb, 1) 
   LOAD (current_length, 2) 
   COPY-LITERAL (current_length_lsb, 3, $codebook_next) 
   COPY-LITERAL (index_lsb, 1, $decompressed_pointer) 
   JUMP (standard_prefix) 
    
   :codeword_control 
    
   ; The following code decompresses a codeword/control value:  
    
   INPUT-BITS ($codeword_length, index, fail) 
   COMPARE ($index, 3, control_code, initialize_memory, codeword) 
    
   :codeword 
    
   ; The following code interprets a codeword as an index into the LZJH 
   ; codebook. It extracts the position/length pair from the specified 
   ; codebook entry; the position/length pair points to a byte string  
   ; in the circular buffer which is then copied to the end of the  
   ; decompressed message. The code also creates a new codebook entry  
   ; consisting of the byte string plus the next character to be  
   ; decompressed: 
    
   :length_value_lsb = length_value + 1 
    
   MULTIPLY ($index, 3) 
   ADD ($index, first_codeword) 
   COPY ($index, 3, length_value_lsb) 
   LOAD (current_length, 1) 
   ADD ($current_length, $length_value) 
   LOAD (codebook_old, $codebook_next) 
   COPY-LITERAL (current_length_lsb, 3, $codebook_next) 
   COPY-LITERAL ($position_value, $length_value, $decompressed_pointer) 
   OUTPUT ($position_value, $length_value) 
   JUMP (prefix_after_codeword) 
    
   :string_extension 
    
   ; The following code decompresses a Huffman-encoded string extension: 
    

Price et al.                                                   [Page 23] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   INPUT-HUFFMAN (index, fail, 4, 1, 1, 1, 1, 2, 1, 3, 2, 1, 1, 1, 13, 
   3, 0, 7, 5) 
   COMPARE ($index, 13, continue, extra_bits, extra_bits) 
    
   :extra_bits 
    
   INPUT-BITS (max_extension_length, extra_extension_bits, fail) 
   ADD ($index, $extra_extension_bits) 
    
   :continue 
    
   ; The following code extends the most recently created codebook entry 
   ; by the number of bits specified in the string extension: 
    
   COPY-LITERAL ($position_value, $length_value, $position_value) 
   COPY-LITERAL ($position_value, $index, $decompressed_pointer) 
   OUTPUT ($position_value, $index) 
   ADD ($index, $length_value) 
   COPY (index_lsb, 1, $codebook_old) 
   JUMP (standard_prefix) 
    
   :control_code 
    
   ; The code can handle all of the control characters in V.44 except  
   ; for ETM (Enter Transparent Mode), which is not required for  
   ; message-based protocols such as SigComp. 
    
   COMPARE ($index, 1, fail, flush, stepup) 
    
   :flush 
    
   ; The FLUSH control character jumps to the beginning of the next  
   ; complete byte in the compressed message: 
    
   INPUT-BYTES (0, 0, 0) 
   JUMP (standard_prefix) 
    
   :stepup 
    
   ; The STEPUP control character increases the number of bits used to 
   ; encode an ordinal value or a codeword: 
    
   INPUT-BITS (1, index, fail) 
   COMPARE ($index, 1, stepup_ordinal, stepup_codeword, 0) 
    
   :stepup_ordinal 
    
   ADD ($ordinal_length, 1) 
   JUMP (ordinal) 
    
   :stepup_codeword 

 
Price et al.                                                   [Page 24] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   ADD ($codeword_length, 1) 
   JUMP (codeword_control) 
    
   :end_of_message 
    
   END-MESSAGE (requested_feedback_location, 
   returned_parameters_location, state_length, 64, 
   decompress_sigcomp_message, 6, 0) 
    
   :fail 
    
   DECOMPRESSION-FAILURE 
    
   :circular_buffer 
    
5.6.  EPIC 
    
   This section provides bytecode for a version of the Efficient 
   Protocol Independent Compression (EPIC) scheme [EPIC]. 
    
   The basic EPIC scheme [EPIC] is designed to compress protocol headers 
   such as TCP/IP, but the underlying algorithm (known as Hierarchical 
   Huffman) can be applied to the compression of arbitrary data. In 
   particular the compression algorithm used by EPIC obtains a very high 
   compression ratio on data with a known structure, so it is ideally 
   suited for compressing the messages generated by SIP or other 
   signaling protocols. 
    
   Note however that in its basic form the EPIC algorithm does not have 
   the ability to detect and adapt to new patterns in the uncompressed 
   data; instead it relies on a fixed pre-programmed description of how 
   the protocol to be compressed is expected to behave. 
    
   The application messages encountered by SigComp will typically 
   contain segments of generic text that cannot be compressed using the 
   basic EPIC scheme. Fortunately however, EPIC can easily be upgraded 
   to cope with generic data by adding the ability to store a circular 
   buffer of previously received text strings as per LZ77 or DEFLATE. 
   The resulting hybrid algorithm offers the best of both worlds: a very 
   high compression ratio for the "well-behaved" parts of the 
   application message, and a good compression ratio even for the 
   portions of the message that cannot be pre-programmed into the 
   compression algorithm. 
    
   The following bytecode implements a decompressor for a hybrid of EPIC 
   and DEFLATE. The tables of compressed characters are generated using 
   the Hierarchical Huffman algorithm from EPIC, and are designed to 
   give a very high compression ratio for a typical SIP/SDP message. The 
   ability to store and retrieve text strings from a buffer of 
   previously received messages is taken from DEFLATE. 
    

Price et al.                                                   [Page 25] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   To illustrate the performance of the hybrid algorithm, the following 
   results have been generated for the call flow in Section 3.1.2 of 
   "SIP Call Flow Examples" [FLOWS]. Note that to improve the overall 
   compression ratio, all algorithms employ a static dictionary (see 
   Section 6.2) and the shared compression mechanism (see Section 6.5): 
    
   Algorithm:                           Total compressed message size: 
    
   DEFLATE with static Huffman codes              660 bytes 
   DEFLATE with adaptive Huffman codes            625 bytes 
   EPIC/DEFLATE hybrid                            560 bytes 
    
   The bytecode for the hybrid EPIC/DEFLATE algorithm is given below. A 
   compressor to generate messages for this algorithm can be adapted 
   from an ordinary DEFLATE compressor; the string matching rules should 
   be left unchanged but the tables of Huffman codes used by DEFLATE 
   should be replaced by those generated by the EPIC algorithm: 
    
   .pad 32 
    
   :index                          .pad 2 
   :distance_value                 .pad 2 
   :old_pointer                    .pad 2 
    
   .align 42 
    
   :requested_feedback_location = 0 
    
   .align 64 
    
   :byte_copy_left                 .pad 2 
   :byte_copy_right                .pad 2 
   :input_bit_order                .pad 2 
   :decompressed_pointer           .pad 2 
    
   :returned_parameters_location = 0 
    
   .align 64 
    
   :initialize_memory 
    
   :udvm_memory_size = 8192 
   :state_length = udvm_memory_size - 64 
    
   MULTILOAD (64, 4, circular_buffer, udvm_memory_size, 0, 
   circular_buffer) 
    
   :decompress_sigcomp_message 
    
   :character_after_literal 
    

Price et al.                                                   [Page 26] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   INPUT-HUFFMAN (index, end_of_message, 16,  
       5, 0, 11, 46, 
       0, 12, 12, 256,  
       1, 26, 32, 257,  
       1, 66, 68, 32,  
       0, 69, 94, 97,  
       0, 95, 102, 264,  
       0, 103, 103, 511,  
       2, 416, 426, 35,  
       0, 427, 465, 58,  
       0, 466, 481, 272,  
       1, 964, 995, 288,  
       3, 7968, 7988, 123,  
       0, 7989, 8115, 384,  
       1, 16232, 16263, 0,  
       0, 16264, 16327, 320,  
       1, 32656, 32767, 144) 
    
   COMPARE ($index, 256, literal, distance, distance) 
    
   :character_after_match 
    
   INPUT-HUFFMAN (index, end_of_message, 16,  
       4, 0, 0, 511,  
       1, 2, 9, 256,  
       1, 20, 22, 32,  
       0, 23, 30, 264,  
       1, 62, 73, 46,  
       0, 74, 89, 272,  
       2, 360, 385, 97,  
       0, 386, 417, 288,  
       1, 836, 874, 58,  
       0, 875, 938, 320,  
       1, 1878, 1888, 35,  
       0, 1889, 2015, 384,  
       1, 4032, 4052, 123,  
       1, 8106, 8137, 0,  
       1, 16276, 16379, 144,  
       1, 32760, 32767, 248) 
    
   COMPARE ($index, 256, literal, distance, distance) 
    
   :literal 
    
   :index_lsb = index + 1 
    
   OUTPUT (index_lsb, 1) 
   COPY-LITERAL (index_lsb, 1, $decompressed_pointer) 
   JUMP (character_after_literal) 
    
   :distance 

 
Price et al.                                                   [Page 27] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   SUBTRACT ($index, 253) 
   INPUT-HUFFMAN (distance_value, fail, 9,  
       9, 0, 7, 9,  
       0, 8, 63, 129,  
       1, 128, 135, 1,  
       0, 136, 247, 17,  
       0, 248, 319, 185,  
       1, 640, 1407, 257,  
       2, 5632, 6655, 1025,  
       1, 13312, 15359, 2049,  
       2, 61440, 65535, 4097) 
    
   LOAD (old_pointer, $decompressed_pointer) 
   COPY-OFFSET ($distance_value, $index, $decompressed_pointer) 
   OUTPUT ($old_pointer, $index) 
   JUMP (character_after_match) 
    
   :end_of_message 
    
   END-MESSAGE (requested_feedback_location, 
   returned_parameters_location, state_length, 64, 
   decompress_sigcomp_message, 6, 0) 
    
   :fail 
    
   DECOMPRESSION-FAILURE 
    
   :circular_buffer 
    
    
6.  Additional SigComp mechanisms 
    
   The following chapter covers the additional mechanisms that can be 
   employed by SigComp to improve the overall compression ratio; 
   including the acknowledgment of SigComp state over an unreliable 
   link, sharing state between two directions of a compressed message 
   flow etc. 
    
   When each of the compression algorithms described in Chapter 5 has 
   successfully decompressed the current SigComp message, the contents 
   of the UDVM memory are saved as a SigComp state item. Subsequent 
   messages can access this state item by uploading the correct state 
   identifier to the receiving endpoint, which avoids the need to upload 
   the bytecode for the compression algorithm on a per-message basis. 
   However, before a state item can be accessed the compressor must 
   first ensure that it is available at the receiving endpoint. 
    
   For each SigComp compartment, the receiving endpoint maintains a list 
   of currently available states (where the total amount of state saved 
   does not exceed the state_memory_size for the compartment). The 


Price et al.                                                   [Page 28] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   SigComp compressor should maintain a similar list containing the 
   states that it has instructed the receiving endpoint to save. 
   As well as tracking the list of state items that it has saved at the 
   remote endpoint, the compressor also maintains a flag for each state 
   item indicating whether the state can safely be accessed or not. 
   State items should not be accessed until they have been acknowledged 
   (e.g. by using the SigComp feedback mechanism as per Section 6.1). 
    
   State items are deleted from the list when the total 
   state_memory_size for the compartment is used up by states of a 
   higher priority. The SigComp compressor should not attempt to access 
   any state items that have been deleted in this manner, as they may no 
   longer be available at the receiving endpoint. 
    
6.1.  Acknowledging a state item 
    
   The simplest method for acknowledging a SigComp state item is to 
   employ a reliable transport layer such as TCP or SCTP. Alternatively, 
   over an unreliable transport such as UDP the SigComp feedback 
   mechanism can be used to acknowledge that a state item has been 
   successfully created at the receiving endpoint. 
    
   As explained in SigComp [SIGCOMP], in order to invoke the feedback 
   mechanism the following fields must be reserved in the UDVM memory: 
    
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   |     reserved      | Q | S | I |  requested_feedback_location 
   +---+---+---+---+---+---+---+---+ 
   | 1 | requested_feedback_length |  if Q = 1 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   requested_feedback_field    :  if Q = 1 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    
   These fields can be reserved in any of the compression algorithms of 
   Chapter 5 by replacing the label ":requested_feedback_location = 0" 
   with the following piece of mnemonic code: 
    
   :requested_feedback_location    .pad 1 
   :requested_feedback_length      .pad 1 
   :requested_feedback_field       .pad 12 
   :hash_start                     .pad 8 
    
   When a SigComp message is successfully decompressed and saved as 
   state, the following bytecode instructs the receiving endpoint to 
   return the first 6 bytes of the corresponding state identifier. The 
   bytecode can be added to any of the compression algorithms of Chapter 
   5 at the ":end_of_message" label: 
    

Price et al.                                                   [Page 29] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   :end_of_message 
    
   :hash_length = state_length + 8 
    
   LOAD (requested_feedback_location, 1158) 
   MULTILOAD (hash_start, 4, state_length, 64, 
   decompress_sigcomp_message, 6) 
   SHA-1 (hash_start, hash_length, requested_feedback_field) 
    
   The receiving endpoint then returns the state identifier in the 
   "returned feedback field" of the next SigComp message to be 
   transmitted in the reverse direction. 
    
   When the state identifier is returned, the compressor can set the 
   availability flag for the corresponding state to 1. 
    
6.2.  Static dictionary 
    
   Certain protocols that can be compressed using SigComp offer a fixed, 
   mandatory state item known as a static dictionary. This dictionary 
   contains a number of text strings that commonly occur in messages 
   generated by the protocol in question. The overall compression ratio 
   can often be improved by accessing the text phrases from this static 
   dictionary rather than by uploading them as part of the compressed 
   message. 
    
   As an example, a static dictionary is provided for the protocols SIP 
   and SDP [SIP-DICT]. This dictionary is designed for use by a wide 
   range of compression algorithms including all of the ones covered in 
   Chapter 5. 
    
   In any of the compression algorithms of Chapter 5, the static 
   dictionary can be accessed by inserting the following instruction 
   after the ":initialize_memory" label: 
    
   STATE-ACCESS (dictionary_id, 6, 0, 0, 1024, 0) 
    
   The following line should also be inserted immediately after the END-
   MESSAGE instruction: 
    
   :dictionary_id .byte 0xc7 0xb6 0x11 0x50 0x61 0x44 
    
   The text strings contained in the static dictionary can then be 
   accessed in exactly the same manner as the text strings from 
   previously decompressed messages (see Section 5.1 for further 
   details). 
    
   Note that in some cases it is sufficient to only load part of the 
   static dictionary into the UDVM memory. Further information on the 
   contents of the SIP and SDP static dictionary can be found in the 
   relevant draft [SIP-DICT]. 

 
Price et al.                                                   [Page 30] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
6.3.  CRC checksum 
    
   Whilst the acknowledgement scheme of Section 6.1 is designed to 
   ensure that SigComp does not propagate errors introduced by the 
   underlying transport layer, in some cases it may be useful to add an 
   extra CRC check over the UDVM memory. For example, if the transport 
   layer fails to discard a damaged SigComp message then a CRC check can 
   ensure that the corresponding decompressed message is not forwarded 
   to the application. 
    
   If an additional CRC check is required then the following bytecode 
   can be inserted after the ":end_of_message" label: 
    
   INPUT-BYTES (2, index, fail) 
   CRC ($index, 64, state_length, fail) 
    
   The bytecode extracts a 2-byte CRC checksum from the end of the 
   SigComp message and compares it with a CRC calculated over the UDVM 
   memory. Decompression failure occurs if the two CRC values do not 
   match. 
    
   A definition of the CRC polynomial used by the CRC instruction can be 
   found in SigComp [SIGCOMP]. 
    
6.4.  Announcing additional resources 
    
   If a particular endpoint is able to offer more processing or memory 
   resources than the mandatory minimum, the SigComp feedback mechanism 
   can be used to announce that these resources are available to the 
   remote endpoint. This may help to improve the overall compression 
   ratio between the two endpoints. 
    
   The values of the following SigComp parameters can be announced using 
   the SigComp feedback mechanism: 
    
   cycles_per_bit 
   decompression_memory_size 
   state_memory_size 
   SigComp_version 
    
   As explained in SigComp, in order to announce the values of these 
   parameters the following fields must be reserved in the UDVM memory: 
    
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   |  cpb  |    dms    |    sms    |  returned_parameters_location 
   +---+---+---+---+---+---+---+---+ 
   |        SigComp_version        | 
   +---+---+---+---+---+---+---+---+ 

 
Price et al.                                                   [Page 31] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   These fields can be reserved in any of the compression algorithms of 
   Chapter 5 by replacing the label ":returned_parameters_location = 0" 
   with the following piece of mnemonic code: 
    
   :returned_parameters_location   .pad 1 
   :returned_sigcomp_version       .pad 1 
    
   When a SigComp message is successfully decompressed and saved as 
   state, the following bytecode announces to the receiving endpoint 
   that additional resources are available at the sending endpoint: 
    
   :end_of_message 
    
   LOAD (returned_parameters_location, N) 
    
   Note that the value "N" should be set equal to the amount of 
   resources available at the sending endpoint. N should be expressed as 
   a 2-byte integer with the most significant bits corresponding to the 
   cycles_per_bit parameter and the least significant bits corresponding 
   to the SigComp_version parameter. 
    
6.5.  Shared compression 
    
   This section provides bytecode for implementing the SigComp shared 
   compression mechanism [EXTENDED]. If two endpoints A and B are 
   communicating via SigComp, shared compression allows the messages 
   sent from Endpoint A to Endpoint B to be compressed relative to the 
   messages sent from Endpoint B to Endpoint A (and vice versa). This 
   may improve the overall compression ratio by reducing the need to 
   transmit the same information in both directions. 
    
   As described in the Extended Operations document, two steps must be 
   taken to implement shared compression at an endpoint. Firstly, it is 
   necessary to announce to the remote endpoint that shared compression 
   is available. Conversely, if such an announcement is received from 
   the remote endpoint then the state created by shared compression can 
   be accessed to improve the overall compression ratio. 
    
   In order to announce that shared compression is available the 
   following fields must be reserved in the UDVM memory: 
    
   0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   |  cpb  |    dms    |    sms    |  returned_parameters_location 
   +---+---+---+---+---+---+---+---+ 
   |        SigComp_version        | 
   +---+---+---+---+---+---+---+---+ 
    
    
Price et al.                                                   [Page 32] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   +---+---+---+---+---+---+---+---+ 
   | length_of_partial_state_id_1  | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :  partial_state_identifier_1   : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
           :               : 
   +---+---+---+---+---+---+---+---+ 
   | length_of_partial_state_ID_n  | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :  partial_state_identifier_n   : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    
   These fields can be reserved in any of the compression algorithms of 
   Chapter 5 by replacing the label ":returned_parameters_location = 0" 
   with the following piece of mnemonic code: 
    
   :returned_parameters_location   .pad 1 
   :returned_sigcomp_version       .pad 1 
   :length_of_partial_state_id_a   .pad 1 
   :partial_state_identifier_a     .pad 6 
   :length_of_partial_state_id_b   .pad 1 
   :partial_state_identifier_b     .pad 20 
    
   :extended_flags                 .pad 2 
   :shared_state_id                .pad 6 
    
   :padding                        .pad 6 
   :minimum_access_length          .pad 2 
   :announcement_location          .pad 2 
   :decompressed_start             .pad 2 
   :decompressed_length            .pad 2 
   :shared_hash_length             .pad 2 
    
   In Figure 5 of the Extended Operations draft, an example SigComp 
   message format is provided to carry the shared compression 
   information between the two endpoints. This message format can be 
   decompressed at the receiving endpoint by inserting the following 
   mnemonic code after the label ":decompress_sigcomp_message" in one of 
   the algorithms of Chapter 5: 
    
   :decompress_sigcomp_message 
    
   INPUT-BYTES (1, extended_flags, fail) 
   COMPARE ($extended_flags, 32768, initialize_state_announcement, 
   access_shared_state, access_shared_state) 
    

Price et al.                                                   [Page 33] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
   :access_shared_state 
   INPUT-BYTES (6, shared_state_id, fail) 
   STATE-ACCESS (shared_state_id, 6, 0, 0, $decompressed_start, 0) 
    
   :initialize_state_announcement 
    
   MULTILOAD (minimum_access_length, 4, 6, length_of_partial_state_id_a, 
   $decompressed_pointer, 5120) 
   COPY-LITERAL (padding, 8, $decompressed_pointer) 
    
   LSHIFT ($extended_flags, 1) 
   COMPARE ($extended_flags, 32768, algorithm_start, 
   announce_acked_state_id, announce_acked_state_id) 
    
   :announce_acked_state_id 
    
   LOAD (length_of_partial_state_id_a, 1536) 
   INPUT-BYTES (6, partial_state_identifier_a, fail) 
   LOAD (announcement_location, length_of_partial_state_id_b) 
    
   :algorithm_start 
    
   Additionally, the following piece of mnemonic code should be inserted 
   following the label ":end_of_message" in the chosen compression 
   algorithm: 
    
   :end_of_message 
    
   LSHIFT ($extended_flags, 1) 
   COMPARE ($extended_flags, 32768, end, announce_shared_state, 
   announce_shared_state) 
    
   :announce_shared_state 
    
   ; The following instructions calculate the state identifier for the 
   ; shared state: 
    
   COPY-LITERAL (decompressed_length, 1, $announcement_location) 
    
   :buffer_size = udvm_memory_size - circular_buffer 
    
   MULTILOAD (decompressed_length, 2, 65528, $decompressed_pointer) 
   SUBTRACT ($shared_hash_length, $decompressed_start) 
   REMAINDER ($shared_hash_length, buffer_size) 
   ADD ($decompressed_length, $shared_hash_length) 
    
   LOAD ($decompressed_start, $decompressed_length) 
   SHA-1 ($decompressed_start, $shared_hash_length, 
   $announcement_location) 
    
   :end 

 
Price et al.                                                   [Page 34] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
7.  Security considerations 
    
   This draft describes implementation options for the SigComp protocol 
   [SIGCOMP]. Consequently the security considerations for this draft 
   match those of SigComp. 
    
    
8.  Acknowledgements 
    
   Thanks to  
    
            Carsten Bormann 
            Adam Roach 
            Lawrence Conroy  
            Christian Schmidt  
            Max Riegel  
            Lars-Erik Jonsson 
            Jonathan Rosenberg 
            Stefan Forsgren  
            Krister Svanbro  
            Miguel Garcia 
            Christopher Clanton  
            Khiem Le  
            Ka Cheong Leung  
    
   for valuable input and review. 
    
    
9. Authors' addresses 
    
   Richard Price        Tel: +44 1794 833681 
   Email:               richard.price@roke.co.uk 
    
   Abigail Surtees      Tel: +44 1794 833131 
   Email:               abigail.surtees@roke.co.uk 
    
   Mark A West          Tel: +44 1794 833311 
   Email:               mark.a.west@roke.co.uk 
    
   Roke Manor Research Ltd 
   Romsey, Hants, SO51 0ZN 
   United Kingdom 
    
    
10. Intellectual Property Right Considerations 
    
   The IETF has been notified of intellectual property rights claimed in 
   regard to some or all of the specification contained in this 
   document. For more information consult the online list of claimed 
   rights. 

 
Price et al.                                                   [Page 35] 

INTERNET-DRAFT             SigComp User Guide              June 24, 2002 
 
 
11. References 
 
   [SIGCOMP]   "Signaling Compression (SigComp)", R. Price et al.,  
               <draft-ietf-rohc-sigcomp-07.txt>, June 2002 
    
   [EXTENDED]  "SigComp - Extended Operations", Hannu et al.,  
               <draft-ietf-rohc-sigcomp-extended-04.txt>, June 2002 
    
   [SIP-DICT]  "The Session Initiation Protocol (SIP) and Session  
               Description Protocol (SDP) static dictionary for  
               Signaling Compression (SigComp)", Garcia et al.,  
               <draft-ietf-sipping-sigcomp-sip-dictionary-03.txt>, June  
               2002 
    
   [FLOWS]     "SIP Call Flow Examples", A. Johnston et al.,  
               "draft-ietf-sipping-call-flows-00.txt", February 2002 
    
   [RFC-2026]  "The Internet Standards Process - Revision 3", Scott 
               Bradner, Internet Engineering Task Force, October 1996 
    
   [RFC-2119]  "Key words for use in RFCs to Indicate Requirement 
               Levels", Scott Bradner, Internet Engineering Task Force, 
               March 1997 
    
   [RFC-2234]  "Augmented BNF for Syntax Specifications: ABNF",   
               D. Crocker and P. Overell, RFC 2234, November 1997 
    
   [RFC-2326]  "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A.  
               Rao and R. Lanphier, RFC 2326, April 1998 
    
   [RFC-2543]  "SIP: Session Initiation Protocol", Handley et al,  
               RFC 2543, Internet Engineering Task Force, March 1999 
    
   [LZ77]      "A universal algorithm for sequential data compression",  
               J. Ziv and A. Lempel, IEEE 23:337-343, 1977 
    
   [LZSS]      "Data Compression: Methods and Theory", J. Storer,  
               Computer Science Press, ISBN 0-88175-161-8, 1988 
    
   [LZW]       "LZW Data Compression", Mark Nelson, Dr. Dobb's   
               Journal, October 1989 
    
   [DEFLATE]   "DEFLATE Compressed Data Format Specification version  
               1.3", RFC 1951, P. Deutsch, May 1996 
    
   [LZJH]      "Data Compression Procedures", ITU-T Recommendation V.44,  
               November 2000 
    
   [EPIC]      "Enhanced TCP/IP Compression for ROHC", R. Price et al.,  
               <draft-price-rohc-epic-03.txt>, February 2002 

 
Price et al.                                                   [Page 36]