Internet Engineering Task Force ROHC/SIP WG Internet Draft Jeff Heath draft-heath-rohc-sip-v44-00.txt Hughes Network Systems Sept 28, 2001 Expires: March 2002 SIP Compression using ITU-T V.44 with Pre-loaded Dictionary This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and it working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Status of this Memo This memo is an Internet-Draft that provides information for the Internet community. Distribution of this memo is unlimited. The file name for this Internet-Draft is draft-heath-rohc-sip-v44-00.txt. Comments are invited and should be addressed to the author whose contact information is in Section 8. Abstract This document describes a compression method based on the data compression algorithm described in ITU-T Recommendation V.44 [V44]. Recommendation V.44 is a modem standard but Annex B of the recommendation describes the implementation of V.44 in packet networks. This document defines the application of V.44 to packets or messages where some or all of the contents of the data is known ahead of time. For such applications the V.44 dictionary is partially pre-loaded after intialization with words and/or phrases that are expected to appear in the data. This Internet Draft describes the mechanism of the dictionary pre-load and its advantages for applications such as SIP/SDP. The V.44 algorithm is unique among data compression algorithms in J. Heath [Page 1] Internet Draft SIP Compression using V.44 Sept 2001 that it combines excellent compression ratios with very fast execution times and low memory requirements. It learns and adapts to the input data very quickly which is one reason it provides superior compression ratios on relatively small blocks of data. The partial pre-load of the dictionary with expected words and phrases will allow even better compression ratios since it will allow V.44 to encode each pre-loaded word or phrase with between 7 to 10 bits each time it is encountered in the data. V.44 is lossless and is based upon the LZJH data compression algorithm. As described in Annex B of [V44], LZJH operates in packet networks using one of two methods: Packet Method where each individual IP packet is compressed/decompressed separately, or Multi-Packet Method where several IP packets are compressed/decompressed as a continuation using a reliable transport. Thoughout the remainder of this document the terms V.44 and LZJH are synonomous. Table of Contents 1. Introduction...................................................2 1.1 Advantages of V.44 Data Compression........................3 1.2 Background of LZJH Compression.............................4 2. LZJH Design and Operation for SIP..............................4 2.1 LZJH Modifications for Pre-loaded Dictionaries.............5 2.2 Operation and Performance..................................6 2.3 Data Integrity.............................................7 3. LZJH Pre-load Test Results.....................................7 3.1 Test 1........................................................7 3.2 Test 1A.......................................................7 3.3 Test 2........................................................7 3.4 Test 3........................................................8 3.5 Test Conclusions..............................................8 4. Algorithm Comparison...........................................9 4.1 Comparison Example 1..........................................9 4.2 Comparison Example 2..........................................9 4.3 Comparison Example 3..........................................9 4.4 Comparison Conclusions........................................10 5. Security Considerations........................................10 6. Intellectual Property Right Considerations.....................10 7. References.....................................................10 8. Authors' Address...............................................11 9. Full Copyright Statement.......................................11 1. Introduction Cellular and wireless networks are planning on implementing more and more Internet, IP based, services in the future. One problem with the set of IP protocols invented with the Internet in mind is the high overhead of additional headers and verbose data fields. Since bandwidth is at a premium in cellular, satellite, and other J. Heath [Page 2] Internet Draft SIP Compression using V.44 Sept 2001 wireless networks the future implementation of these protocols poses a bandwidth efficiency problem. For example, application signaling protocols such as SIP [SIP], SDP [SDP] and RTSP [RTSP] will typically be used to setup and control applications in a mobile Internet based on a cellular network. However, the generous size of the protocol messages combined with their request/response nature create delays and waste bandwidth in a cellular network. In order to reduce the delays incurred and bandwidth used by these protocols it is necessary to use compression techniques in cellular, satellite, and other wireless networks. Some of these protocols contain messages that contain data that is known in advance and is highly redundant. Since the same protocols also have messages containing ASCII free form text, a general purpose data compression algorithm which can take advantage of words or phrases known to appear in the text is the best solution. 1.1 Advantages of V.44 Data Compression Below are some of the reasons ITU-T V.44 data compression should be considered for this environment: A. The primary reason for using compression for SIP/SDP is to reduce the amount of data going over the RAN. LZJH gets the best compression in this environment and therefore will the most effective in data reduction. B. LZJH is a general purpose algorithm. If operating in a network entity for one application, such as SIP/SDP, it can be used for other applications such as the compression of IP data packets [IPCOMP] to improve the performance of the Radio Access Network. C. LZJH is the basis of the latest ITU-T data compression standard, ITU-T V.44 and will become widely used in networks. New developments, such as SIP/SDP in UMTS, will benefit from using the latest standarized compression technology. D. As defined in Annex B of V.44, the algorithm operates in packet networks on one packet at a time or on several packets using a reliable transport. LZJH source code handles both methods on a per session basis at run time. Thus, each network entity can indepedently determine, based on its resources, which method to use at any time. E. LZJH is superior to the other algorithms being considered for the SIP/SDP type application. In this application, LZJH will get about twice the compression as LZW using the same MIPs and memory. LZJH will get 20% to 35% better compression than LZSS while using about one fifth the MIPs and similar memory. Refer to section 4 for algorithm comparisons. J. Heath [Page 3] Internet Draft SIP Compression using V.44 Sept 2001 F. LZJH was designed for packetized data. It can get good compression on packets of 200 bytes and decent compression on packets as small as 50 bytes. 1.2 Background of LZJH Compression LZJH is similar to the algorithm described in [LZ78] although is has aspects which are similar to the algorithm described in [LZ77]. As such, it provides the execution speed and low memory requirements of [LZ78] with compression ratios that are better than [LZ77]. Originally developed for the satellite industry to compress IP datagrams independantly, it is ideal for the SIP/SDP application. The LZJH algorithm was modified to compress a continuous stream of data for a modem environment and this modified version is the basis for Recommendation V.44. LZJH is an adaptive, general purpose, lossless data compression algorithm that provides excellent performance across a wide variety of data types, particularly ASCII text with high redundancy. A typical [LZ78] compression algorithm, such as LZW, is not suitable for application in a packet network since it takes too long to build up its dictionary, resulting in poor compression ratios on IP datagrams that are compressed separately. A typical [LZ77] compression algorithm, such as LZSS, suffers in the packet network application due to poor execution times. LZJH not only has superior compression ratios when encoding or decoding packetized data, but it uses little memory and provides very fast execution times. The details of LZJH compression can be found in [V44]. 2. LZJH Design and Operation for SIP As stated earlier, LZJH operates using one of two methods in a packet network. Packet Method where each IP packet, or protocol message, is compressed separately and the dictionaries are re-initialized between messages. Multi-Packet Method where several messages are compressed and data from previous messages remains in the dictionary for use in compressing the current message. Multi-Packet Method requires that the messages, or packets, be delivered in order without any packet loss. In other words, it requires a reliable transport. The definition of that transport in the SIP/SDP or other similar application is beyond the scope of this document. Multi-Packet Method requires a separate history buffer to hold the data from the messages that were previouly processed. This method also requires that each separate thread or session supported by a network entitiy, such as the P-CSCF, have a separate J. Heath [Page 4] Internet Draft SIP Compression using V.44 Sept 2001 context for compression. This context includes information for the reliable link and compression dictionaries, including history buffer. In contrast, Packet Method does not require a reliable transport, does not require in-order delivery of messages, and does not require a separate history. Packet Method is implemented in a serving entity, supporting hundreds of simultaneous sessions, with a single instance of encoder and decoder dictionaries. For this application the total memory requirement for both encoder and decoder is a trivial 16K. About 11K for the encoder, and about 5K for the decoder. Multi-Packet Method provides better compression ratios than Packet Method, but at the cost of considerable memory for a serving network entity. In addition, the cost of complexity and bandwidth to implement the underlying reliable transport may offset the better compression ratios. 2.1 LZJH Modifications for Pre-loaded Dictionaries The modifications required to pre-load the LZJH dictionaries are minimal and will not affect the basic algorithm. Thus, the same LZJH object module operating in a network entity will be able to operate normally for general purpose compression or operate using pre-loaded dictionaries. Currently the LZJH algorithm consists of several functions which are called by a user supplied control function which controls the operation of the algorithm. They are: - initialize Packet Method - initialize Multi-Packet Method - re-initialize compressor (encoder) dictionary - re-initialize decompressor (decoder) dictionary - compress packet - decompress packet New functions to support the pre-load of the dictionaries have been added as follows: - pre-load compressor dictionary - pre-load decompressor dictionary - re-initialize compressor pre-loaded dictionary - re-initialize decompressor pre-loaded dictionary The dictionary pre-load procedures will take as an input into the procedure two arrays that define the words or phrases (strings) to be pre-loaded. Note that the encoder and its peer decoder must be pre-loaded with exactly the same arrays for compression/decompression to work correctly. The content, origin, and distribution of the pre-load arrays is beyond the scope of this document and there could conceivably be different pre-load arrays for each message type. The pre-load arrays consists of an array of the strings to be J. Heath [Page 5] Internet Draft SIP Compression using V.44 Sept 2001 pre-loaded and an array of string lengths corresponding to the strings as follows: char pl_data [] = "INVITE" "SIP" "/2.0" "sip:" "Via: " "SIP/2.0/UDP" "From: " "To: " "Call-ID: " "CSeq: "; byte pl_len [] = {6,3,4,4,5,11,6,4,9,6,0}; The above defines 10 strings to be pre-loaded. Note that a string length of zero indicates the end of the strings. Also note this mechanism is subject to change as better ideas come along. Once the encoder and decoder dictionaries are pre-loaded with a set of strings, the re-initialize pre-loaded dictionary procedures will re-initialize only the dynamic portion of the dictionaries and histories, the static, pre-loaded, strings will be maintained. Thus, the LZJH encoder does not have to keep compressing the same pre-loaded data on each message which saves processing time. To remove the pre-loaded information, the control function uses either the re-initialize compressor (or decompressor) function to clear the information or uses the pre-load compressor (or decompressor) dictionary functions to pre-load different information. It may be desireable to place a byte into the message header or as the first byte of compressed message data to indicate which pre-load table was used by the compressor. Alternatively, the type of message, INVITE, etc., can explicitly indicate which pre-load table was used. 2.2 Operation and Performance During the pre-load of either the compressor (encoder) or decompressor (decoder) dictionary, the information required to define the strings is loaded into the dictionary structures and the strings themselves are cancatenated into one array of data that occupies the first N locations of the packet buffer to be processed. Thus, the data to be compressed is copied into the first location following the "static" data in the packet buffer prior to compression. Once pre-loaded, the dictionary can be re-initialized while maintaining the pre-loaded portion of the dictionary and static data, a major savings in execution speed. With LZSS, it will be necessary to compress the static data each time to rebuild the binary tree. J. Heath [Page 6] Internet Draft SIP Compression using V.44 Sept 2001 2.3 Data Integrity If a CRC or other such mechanism is not used by the underlying link layer to insure the integrity of the messages transferred over the RAN, then it may be necessary to prepend a 16 bit CRC or LRC to the compressed data to insure its integrity prior to decompression. 3. LZJH Pre-load Test Results The LZJH algorithm was modified as described in section 2 and tests were run on the SIP messages described in section 3.1.1 of [FLOWS]. 3.1 Test 1 The first INVITE message and ACK message sent from User A to User B in section 3.1.1 of [FLOWS] are compressed separately with the LZJH dictionary pre-loaded with expected strings. message uncompressed bytes compressed bytes ratio ---------------------------------------------------------- INVITE 423 152 2.8 ACK 212 88 2.4 Note in section 4.1 the LZJH algorithm without a pre-load reduced the same INVITE message to 266 bytes. The pre-load improved the compression ratio by 75% on the INVITE. This is the compression ratio on an INVITE and ACK messages compressed separately, using LZJH Packet Method, where a reliable transport or any other mechanism to maintain data between messages is not required. Both used the same set of pre-loaded strings. 3.2 Test 1A The same INVITE message and ACK message from Test 1 are compressed using Multi-Packet Method where the dictionary is maintained between the two messages. message uncompressed bytes compressed bytes ratio ---------------------------------------------------------- INVITE 423 152 2.8 ACK 212 29 7.3 3.3 Test 2 The first INVITE message in section 3.1.1 of [FLOWS] is repeated 5 times with certain fields changed for each repitition, such as To URL and display name, rtpmap, etc. This is to provide a ballpark comparison with the "friendly" test in section 5 of UDPCOMP SIP compression [UDP]. J. Heath [Page 7] Internet Draft SIP Compression using V.44 Sept 2001 message uncompressed bytes compressed bytes ratio ---------------------------------------------------------- INVITE 1 423 152 2.8 INVITE 2 418 48 8.7 INVITE 3 426 43 9.9 INVITE 4 425 37 11.5 INVITE 5 429 37 11.6 As can be seen, compression ratio improvements in the first 2 INVITES are dramatic compared to UDPCOMP and even the last 3 INVITES are better even though the INVITE messages are larger. The total compression ratio of all 5 INVITES is 6.7 compared to the total ratio of 2.7 (2009 / 737) for UDPCOMP, more than double. This test shows the compression ratio on a series of 5 INVITE messages using LZJH Multi-Packet Method over a reliable transport. 3.4 Test 3 Each of the 5 INVITE messages compressed in Test 2, above, is compressed separately using LZJH Packet Method without a reliable transport or any other mechanism to maintain dictionaries between each message. message uncompressed bytes compressed bytes ratio ---------------------------------------------------------- INVITE 1 423 152 2.8 INVITE 2 418 155 2.7 INVITE 3 426 158 2.7 INVITE 4 425 158 2.7 INVITE 5 429 159 2.7 The average of the 5 INVITES is a compression ratio of 2.7 which is exactly the same total compression ratio achieved by UDPCOMP on 5 INVITES. Each compressed message stands on its own and will obtain the same compression regardless if previous messages are received. 3.5 Test Conclusions In an enviroment where the codebook or dictionary of SIP messages is maintained between messages via a reliable transport or other mechanism, LZJH with a pre-loaded dictionary can obtain compression ratios that are more than double those of UDPCOMP (refer to Test 2). In environment where each message stands on its own, LZJH with a pre-loaded dictionary can reduce a SIP message by some 60% where UDPCOMP actually expands the first INVITE message in [UDP] section 5. SIP signalling requires the exchange of just a few messages, if it takes a few messages for a complicated mechanism to obtain good compression ratios over the total of all messages then any advantage is negated. J. Heath [Page 8] Internet Draft SIP Compression using V.44 Sept 2001 LZJH can obtain very good compression ratios using Multi-Packet Method over a reliable transport. However, since LZJH can reduce SIP messages by more than half without any mechanism where the data in one message is used to compress following messages, the complication of a reliable transport or other such mechanisms may offset the extra savings. Since LZJH supports both methods, it is up to network designers to determine which method to use on each call flow. 4. Algorithm Comparison Compression ratio comparisons between three algorithms, LZJH, LZSS, and LZW, are provided in this section. 4.1 Comparison Example 1 The first INVITE message in section 3.1.1 of [FLOWS] is compressed by each algorithm without the pre-load of any algorithm. This shows the raw ability of the algorithm to compress a single SIP message. algorithm uncompressed bytes compressed bytes delta ------------------------------------------------------------- LZJH 423 266 - LZSS 423 329 23.7% LZW 423 329 23.7% In the above example, LZJH obtains a compression ratio of 1.55 on a single SIP message without a pre-load of the dictionary. 4.2 Comparison Example 2 The first INVITE message in section 3.1.1 of [FLOWS] is duplicated into a single message and then compressed by each algorithm without the pre-load of any algorithm. This compares the ability of the algorithm to compress a SIP message when the same SIP message is already in the dictionary or history memory. algorithm uncompressed bytes compressed bytes delta ------------------------------------------------------------- LZJH 846 273 - LZSS 846 357 30.8% LZW 846 536 96.3% This example shows that LZJH uses just 7 additional bytes to encode the entire duplicated 2nd SIP message (273 - 266 = 7). LZSS uses 28 additional bytes, and LZW 207 additional bytes. 4.3 Comparison Example 3 An HTML file with significant redundancy is compressed. This is to test both the compression ratio and encoder speed of each algorithm. The SIP messages are too small for accurate encoder speed measurements. The file is compressed as separate packets of 1500 J. Heath [Page 9] Internet Draft SIP Compression using V.44 Sept 2001 bytes each with compression dictionaries re-initialized between each packet, the compression ratio is the average for all packets. The encoder time is total for all packets. Note that at this writing, the author does not have a version of LZSS using binary trees for speed measurement, thus, a variant of LZSS is used for the speed measurements, LZS which uses hash tables and is faster than LZSS. Compression ratio measurements are LZSS. algorithm encoder uncompressed compressed delta time bytes bytes ------------------------------------------------------------- LZJH 2 97030 35081 - LZS/LZSS 13 97030 42562 21.3% LZW 3 97030 62586 78.4% The above example shows that both LZJH and LZW encoders are several times faster than that of LZSS. Decoders are not measured here, however, previous testing has shown that the LZJH decoder is about 3 times faster than the LZJH encoder. The LZSS decoder should be as fast as the LZJH decoder and the LZW decoder should be slower by about a facter of 2. 4.4 Comparison Conclusions Among the 3 algorithms, LZJH is the obvious choice for both speed and compression performance. 5. Security Considerations There are no specific security considerations regarding the proposal in this Internet Draft. However, the application of data compression, as described herein, should always be done at a layer above any encryption, such as IPSec. 6. Intellectual Property Right Considerations The LZJH data compression algorithm is referenced in one or more patents or patent applications. To the extent that some of the concepts in this document are adopted in a specification, Hughes Network Systems agrees to license patents technically necessary to implement the specification on fair, reasonable, and nondiscriminatory terms. Since HNS is a potential manufacturer of 3rd generation satellite terminals, this may be based on reciprocity where possible. 7. References [IPCOMP] Shacham, A., "IP Payload Compression Protocol (IPComp)", RFC 2393, December 1998. [LZ77] Lempel, A., and Ziv, J., "A Universal Algorithm for J. Heath [Page 10] Internet Draft SIP Compression using V.44 Sept 2001 Sequential Data Compression", IEEE Transactions On Information Theory, Vol. IT-23, No. 3, May 1977. [LZ78] Lempel, A., and Ziv, J., "Compression of Individual Sequences via Variable Rate Coding", IEEE Transactions On Information Theory, Vol. IT-24, No. 5, Sep 1978. [V44] ITU Telecommunication Standardization Sector (ITU-T) Recommendation V.44 "Data Compression Procedures", November 2000. [FLOWS] A. Johnston, S. Donovan, C. Cunningham, D. Willis, J. Rosenberg, K. Summers, H. Schulzrinne, "SIP Call Flow Examples" Internet Draft, Internet Engineering Task Force, June 2001. [UDP] J. Rosenberg, "Compression of SIP" Internet Draft, Internet Engineering Task Force, July, 2001. 8. Authors' Address Jeff Heath Hughes Network Systems 10450 Pacific Center Ct. San Diego, CA 92121 voice: 858-452-4826 fax: 858-597-8979 e-mail: jheath@hns.com 9. Full Copyright Statement Copyright (C) The Internet Society (1998). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an J. Heath [Page 11] Internet Draft SIP Compression using V.44 Sept 2001 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. J. Heath [Page 12]