Network Working Group Richard Price, Siemens/Roke Manor INTERNET-DRAFT Robert Hancock, Siemens/Roke Manor Expires: January 2002 Stephen McCann, Siemens/Roke Manor Mark A West, Siemens/Roke Manor Abigail Surtees, Siemens/Roke Manor Paul Ollis, Siemens/Roke Manor 13 July, 2001 Signaling Compression for ROHC Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC-2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This document is a submission to the IETF ROHC WG. Comments should be directed to the mailing list of ROHC, rohc@cdt.luth.se. Abstract This draft describes a ROHC profile for the robust compression of signaling messages including SIP. The RObust Header Compression [ROHC9] scheme is designed to compress packet headers over error prone channels. It is built around an extensible core framework that can be tailored to compress new protocol stacks by adding additional ROHC profiles. The new profile for signaling compression is provided by the Efficient Protocol Independent Compression [EPIC] scheme. Price et al. [PAGE 1] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 Table of contents 1. Introduction.................................................2 2. Terminology..................................................3 3. Overview of signaling compression............................3 4. Learning version of EPIC.....................................3 5. Encoding methods required for learning version...............5 5.1. VALUE......................................................5 5.2. STATIC.....................................................5 5.3. OPTIONAL...................................................5 5.4. LIST.......................................................6 6. Converting a BNF description into EPIC input code............6 6.1. New BNF object.............................................6 6.2. Reference to BNF objects...................................7 6.3. Additional BNF metasymbols.................................7 6.4. Adding probability values..................................7 7. Achieving robustness.........................................8 8. Performance evaluation.......................................9 9. Security Considerations......................................10 10. Acknowledgements............................................10 11. Intellectual Property Considerations........................10 12. References...................................................10 13. Authors' Addresses...........................................11 Appendix A: Example EPIC input code for SIP.....................12 1. Introduction This document describes a method for compressing signaling messages within the [ROHC9] framework. The new profile for signaling compression is provided by the Efficient Protocol Independent Compression [EPIC] scheme. EPIC takes as its input a list of fields in the new protocol to be compressed, and for each field a choice of one or more compression techniques. Using this input EPIC generates a set of compressed packets that can be used to quickly and efficiently compress data from the new protocol. The compressed packets are fully compatible with the ROHC framework and in particular can make use of the robustness mechanisms described in [ROHC9]. As signaling message protocols such as SIP have complex behavior and a large number of fields, this draft describes a "learning" version of EPIC that can discover how to compress new protocols automatically based on the BNF (Backus Naur Form) description of the protocol. Price et al. [PAGE 2] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. 3. Overview of signaling compression The Efficient Protocol-Independent Compression [EPIC] scheme is designed to generate ROHC profiles for the compression of new protocol stacks. The scheme includes a number of basic compression techniques (LSB encoding, INFERRED encoding etc.) and a simple language for assigning one or more of these techniques to each field in the stack. In particular EPIC can be used to generate new ROHC profiles for the compression of signaling messages such as SIP. Since EPIC is pre- programmed with knowledge of how the signaling protocol behaves, the compression ratio obtained is very high and the processing and memory requirements are low. The drawback with using the standard version of EPIC to compress signaling messages is that it must be programmed with information on how to compress every field in the chosen signaling protocol. This process is straightforward (based on knowledge of how the signaling protocol behaves) but somewhat time-consuming. Fortunately however it is possible to circumvent the programming phase by using the "learning" version of EPIC described in subsequent chapters. 4. Learning version of EPIC The information required by EPIC to build a new ROHC profile is as follows: - Description of where fields occur in the chosen protocol (note that EPIC can cope efficiently with optional, variable-length and out-of-order fields) - For each field, a choice of one or more compression techniques. - If more than one compression technique is available for a field, the probability that each will be used is also given. For example, using the EPIC input language a field may be compressed as follows: encode Field1 as STATIC 80% or LSB(4,-1) 15% or IRREGULAR(16) 5% The three compression techniques are STATIC (field value is the same as the previous value), LSB (only the Least Significant Bits of the field are transmitted) and IRREGULAR (field value is random and must be transmitted in full). Price et al. [PAGE 3] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 The probabilities reflect the fact that Field1 is expected to remain static for 80% of the time, to increase by a small amount for 15% of the time and to behave randomly for the remaining 5% of the time. A learning version of EPIC must discover all of this profile-building information automatically. The most difficult task is to discover how to parse the protocol and divide it up suitably into fields; fortunately this information is already available from the BNF description of SIP and other protocols. BNF (Backus Naur Form) is a "metasyntax" commonly used to describe the syntax of protocols and languages. An example BNF description taken from [SIP] is given below: host ::= | IPv4address ::= 1* "." 1* "." 1* "." 1* hostname ::= *( "." ) [ "." ] domainlabel ::= * toplabel ::= * alphanum ::= | alpha ::= | upalpha ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" lowalpha ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Figure 1: Example BNF description of a protocol A number of variants exist on the basic BNF metasyntax, for example the Augmented BNF described in [RFC-2234] and used in [SIP]. The following metasymbols are available in ABNF: BNF_object ::= Defines a new BNF object in terms of other BNF objects or ASCII characters Reference to BNF object defined using "::=" | ... | Choice of n different BNF objects Price et al. [PAGE 4] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 "string" String of ASCII characters [] Optional BNF object x*y List of between x and y occurrences of BNF_object. If x is omitted then x = 0, and if y is omitted then y = infinity When converting the BNF description of a protocol into input code for EPIC, each BNF object is treated as a "field" in the protocol to be compressed and hence must be assigned one or more compression techniques. The following chapter describes the compression techniques available in [EPIC]. 5. Encoding methods required for learning version Recall that [EPIC] has a library of commonly used compression techniques that can be applied to compress individual fields in a protocol stack. Each technique is known as an "encoding method" since it encodes the field as a shorter compressed version that can be used by the decompressor to rebuild the original field. The learning version of EPIC uses the following four encoding methods to compress signaling messages: 5.1. VALUE Notation: VALUE("") The VALUE encoding method can be used to transmit one particular value for a field. Note that since the signaling protocols to be compressed are ASCII-based, the parameter for VALUE encoding is an ASCII string (compared to a bit string for [EPIC]). 5.2. STATIC Notation: STATIC The STATIC encoding method can be used when the field does not change relative to its previous value. If a field is STATIC then no information concerning the field need be transmitted in the compressed message. 5.3. OPTIONAL Notation: OPTIONAL() The OPTIONAL encoding method is used to compress fields that are optionally present in the uncompressed message. The parameter for OPTIONAL encoding is the name of another encoding method, which is used to compress the optional field whenever it is present in the uncompressed message. Price et al. [PAGE 5] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 5.4. LIST Notation: LIST(,,) The LIST encoding method compresses a list of items. Note that the LIST encoding method used for signaling messages is a simplified version of the LIST encoding method available in [EPIC]. The first two parameters specify the minimum and maximum number of items in the list (where the second parameter is set to 0 if the list may contain an unlimited number of items). The third parameter gives the encoding method that is used to compress each individual list item. 6. Converting a BNF description into EPIC input code Before converting the BNF description into input code for the EPIC compression scheme, all parentheses are removed by replacing the contents of the parentheses with a new BNF object. For example: hostname ::= *( "." ) [ "." ] The brackets can be eliminated as follows: hostname ::= * [ ] newobject1 ::= "." newobject2 ::= "." Note that the contents of square brackets are also replaced with a single BNF object, but the square brackets themselves must not be deleted because they have semantic significance (they indicate that the BNF object is optional). Each BNF metasymbol is then converted into input code for EPIC as follows: 6.1. New BNF object BNF description: BNF_object ::= For each object defined in the BNF description, use the EPIC "method" command to create a new encoding method with the same name as the object. For example: method BNF_object # Description of the BNF object appears here ... end_method Price et al. [PAGE 6] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 Note that in EPIC input code, encoding methods are usually given uppercase names. 6.2. Reference to BNF objects BNF description: | | ... | When a choice of n different BNF objects is made available in the BNF description, use the EPIC "encode" command to call the n encoding methods defined for these objects. Additionally, the STATIC encoding method is added to improve the compression ratio whenever the BNF object remains constant relative to its previous value. Consider the following example: host ::= | In EPIC input code this becomes: method HOST encode host_address as STATIC 34% or IPv4ADDRESS 33% or HOSTNAME 33% end_method Note that as with [EPIC] the name appearing immediately after the "encode" command is provided to improve the readability of the input code, and is not relevant when parsing the code (except as a placeholder). The probability that each encoding method will be used to compress the signaling message is found by experimentation as explained in Section 6.4. 6.3. Additional BNF metasymbols The remaining BNF metasymbols are converted into EPIC input code as follows: "string" For strings of ASCII characters, replace with the encoding method VALUE("string") [] For optional BNF objects, replace with the encoding method OPTIONAL(BNF_object) x*y () For lists of BNF objects, replace with the encoding method LIST(x,y,BNF_object) 6.4. Adding probability values When each of the BNF metasymbols have been converted into EPIC input code, the final step is to add probability values to each encoding Price et al. [PAGE 7] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 method indicating the percentage of time that the encoding method is used to compress the field in question. The probability that each encoding method will be used is discovered by applying the scheme to a selection of signaling messages. The number of times that each encoding method is used is recorded, and the results are scaled to give the necessary probabilities. Note that the selection of messages provided to EPIC in this "learning" phase should reflect as accurately as possible the mix of messages that will be compressed by the resulting [ROHC9] profile. More accurate probability values give a higher compression ratio when the profile is applied. Of course, it is even possible to dynamically adjust the probability values whilst the profile is in use. This allows the profile to adapt to changes in the message stream, so it can continue to achieve a high compression ratio even if the actual behavior of the messages deviates from the expected behavior. Note however that the probability values at the compressor and decompressor must be kept in sync. Over a reliable link this is not a problem, because the compressor and decompressor each receive exactly the same packets. Over an unreliable link however packets may be lost or damaged between the compressor and decompressor, causing the profiles to become out of step. A more robust alternative is to only update the profile at the compressor, and to periodically transmit the delta changes to the decompressor in a special "profiling" packet. The compressor only uses the updated profile to compress messages once it is confident that the decompressor has received it correctly. 7. Achieving robustness Since the new profile generated by EPIC is fully compatible with the [ROHC9] framework, it can make use of all the robustness techniques available for ROHC profiles. In particular, if decompression is required even in the presence of bit errors and dropped packets then the following measures are available: - A CRC checksum can be provided over the uncompressed message to verify that correct decompression has occurred. - A sequence number can be provided to detect lost packets. - Any information which may be used for future decompression (known in ROHC as context-updating information) can be sent multiple times to ensure that it is received by the decompressor. Note that the level of robustness employed by a [ROHC9] profile can be tailored precisely depending on the link conditions for which the profile is designed. The robustness mechanisms can also be switched off if the link is expected to be sufficiently reliable. Price et al. [PAGE 8] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 8. Performance evaluation The performance of the ROHC profile for SIP compression has been evaluated on a number of message sequences taken from [FLOWS]. An example of the compression ratio achieved is given below for the case of a "successful simple SIP to SIP" message sequence (Section 3.1.1 of [FLOWS]). For comparison, the compression ratio obtained using GZIP Lempel-Ziv compression [RFC-1952] is also given. Both schemes are initially started with no information about the SIP message flow (for GZIP the dictionary is empty and for EPIC the sets of probability values are all assumed to be equal). Moreover, GZIP is not applied to individual messages in the flow but instead to the concatenation of all messages received up to a given point. This improves the compression ratio because the dictionary can be retained from previous messages. Similarly EPIC uses the previously received messages as context for the STATIC encoding method. This illustrates the increase in compression efficiency as both schemes learn more information about the message flow. The compression ratio for each individual message is given below: Individual Compressed Size Compression Ratio Message Size (octets) EPIC GZIP EPIC GZIP 423 152 292 2.78 1.45 189 6 21 31.50 9.00 205 16 21 12.81 9.76 420 28 50 15.00 8.40 195 4 11 48.75 17.73 214 8 28 26.75 7.64 198 2 3 99.00 66.00 Additionally, the cumulative compression ratio for all messages received up to a given point is shown below: Cumulative Compressed Size Compression Ratio Message Size (octets) EPIC GZIP EPIC GZIP 423 152 292 2.78 1.45 612 158 313 3.87 1.96 817 174 334 4.70 2.45 1237 202 384 6.12 3.22 1432 206 395 6.95 3.63 1646 214 423 7.69 3.89 1844 216 426 8.54 4.33 The results show that EPIC achieves a compression ratio approximately double that of a Lempel-Ziv scheme such as GZIP. The reason for this is that EPIC has been programmed specifically to compress SIP messages (by providing the BNF description for SIP). Price et al. [PAGE 9] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 9. Security Considerations EPIC generates compressed header formats for direct use in ROHC profiles. Consequently the security considerations for EPIC match those of [ROHC9]. 10. Acknowledgements Header compression schemes from [ROHC9] have been important sources of ideas and knowledge. Basic Huffman encoding [HUFF] was enhanced for the specific tasks of robust, efficient packet compression. Thanks to Carsten Bormann (cabo@tzi.org) Christian Schmidt (christian.schmidt@icn.siemens.de) Max Riegel (maximilian.riegel@icn.siemens.de) David Keogh (david.keogh@roke.co.uk) Lawrence Conroy (lwc@roke.co.uk) for valuable input and review. 11. Intellectual Property Considerations This proposal in is full conformity with [RFC-2026]. Siemens may have patent rights on technology described in this document which employees of Siemens contribute for use in IETF standards discussions. In relation to any IETF standard incorporating any such technology, Siemens hereby agrees to license on fair, reasonable and non-discriminatory terms, based on reciprocity, any patent claims it owns covering such technology, to the extent such technology is essential to comply with such standard. 12. References [ROHC9] "RObust Header Compression (ROHC)", Carsten Bormann et al, , Internet Engineering Task Force, February 7, 2001 [EPIC] "TCP/IP Compression for ROHC", Richard Price et al, , Internet Engineering Task Force, July 9, 2001 [SIP] "SIP: Session Initiation Protocol", Handley et al, RFC2543, Internet Engineering Task Force, March 1999 [FLOWS] "SIP Call Flow Examples", Alan Johnston et al, , Internet Engineering Task Force, June 2001 [HUFF] "The Data Compression Book", Mark Nelson and Jean-Loup Gailly, M&T Books, 1995 Price et al. [PAGE 10] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 [RFC-1952] "GZIP file format specification version 4.3", P. Deutsch, Internet Engineering Task Force, May 1996 [RFC-2026] "The Internet Standards Process - Revision 3", Scott Bradner, Internet Engineering Task Force, October 1996 [RFC-2119] "Key words for use in RFCs to Indicate Requirement Levels", Scott Bradner, Internet Engineering Task Force, March 1997 [RFC-2234] "Augmented BNF for Syntax Specifications: ABNF", Crocker et al, RFC2234, Internet Engineering Task Force, November 1997 13. Authors' Addresses Richard Price Tel: +44 1794 833681 Email: richard.price@roke.co.uk Robert Hancock Tel: +44 1794 833601 Email: robert.hancock@roke.co.uk Stephen McCann Tel: +44 1794 833341 Email: stephen.mccann@roke.co.uk Mark A West Tel: +44 1794 833311 Email: mark.a.west@roke.co.uk Abigail Surtees Tel: +44 1794 833131 Email: abigail.surtees@roke.co.uk Paul Ollis Tel: +44 1794 833168 Email: paul.ollis@roke.co.uk Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom Price et al. [PAGE 11] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 Appendix A: Example EPIC input code for SIP This appendix lists the EPIC input code for the BNF fragment given in Figure 1. Note that this is for information only, since an implementation of "learning" EPIC should be able to read a BNF description directly and convert it into a [ROHC9] profile without requiring a manual conversion step. Note also that the probability values are all assumed to be equal when more than one choice of encoding is available. In practice more appropriate values can be obtained from actual SIP data. method HOST encode host_address as STATIC 34% or IPv4ADDRESS 33% or HOSTNAME 33% end_method method IPv4ADDRESS encode component1 as LIST(1,0,DIGIT) 100% encode separator1 as VALUE(".") 100% encode component2 as LIST(1,0,DIGIT) 100% encode separator2 as VALUE(".") 100% encode component3 as LIST(1,0,DIGIT) 100% encode separator3 as VALUE(".") 100% encode component4 as LIST(1,0,DIGIT) 100% end_method method HOSTNAME encode domain_label as LIST(0,0,DOMAINLABEL) 100% encode top_label as TOPLABEL 100% encode terminator as OPTIONAL(VALUE(".")) 100% end_method method DOMAINLABEL encode domain_name as LIST(0,0,ALPHANUM) 100% encode separator as VALUE(".") 100% end_method Price et al. [PAGE 12] INTERNET-DRAFT Signaling Compression for ROHC 13 July, 2001 method TOPLABEL encode top_name as LIST(0,0,ALPHA) 100% end_method method ALPHANUM encode character as STATIC 34% or ALPHA 33% or DIGIT 33% end_method method ALPHA encode letter as STATIC 34% or LOWALPHA 33% or UPALPHA 33% end_method method UPALPHA encode uppercase_letter as STATIC 3.70% or VALUE("A") 3.70% or VALUE("B") 3.70% . . . . or VALUE("Z") 3.70% end_method method LOWALPHA encode lowercase_letter as STATIC 3.70% or VALUE("a") 3.70% or VALUE("b") 3.70% . . . . or VALUE("z") 3.70% end_method method DIGIT encode digit as STATIC 9.09% or VALUE("0") 9.09% or VALUE("1") 9.09% . . . . or VALUE("9") 9.09% end_method Price et al. [PAGE 13]