Network Working Group Richard Price, Siemens/Roke Manor INTERNET-DRAFT Robert Hancock, Siemens/Roke Manor Expires: August 2001 Stephen McCann, Siemens/Roke Manor Mark A West, Siemens/Roke Manor Abigail Surtees, Siemens/Roke Manor Paul Ollis, Siemens/Roke Manor Christian Schmidt, Siemens February 23, 2001 Efficient Protocol Independent Compression (EPIC) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC-2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This document is a submission to the IETF ROHC WG. Comments should be directed to the mailing list of ROHC, rohc@cdt.luth.se. Abstract The RObust Header Compression [ROHC8] scheme is designed to compress packet headers over error prone channels. It is structured into two main parts. Firstly there is an extensible core framework covering the basic ROHC protocol and packet formats, which is instantiated for a particular protocol stack by means of a ROHC profile. Secondly [ROHC8] also includes a number of specific profiles for particular protocol stacks (RTP/UDP/IP etc.). The Efficient Protocol Independent Compression (EPIC) scheme is a general mechanism for providing additional profiles for use within the ROHC framework. It can produce a set of compressed header formats for an arbitrary protocol stack and a chosen level of robustness. As with [ROHC8], the decompressed headers are semantically identical to the original uncompressed headers. Moreover the headers generated by EPIC have the special property that, in a well defined sense, they are optimally efficient. Price et al. [PAGE 1] INTERNET-DRAFT EPIC February 23, 2001 Revision history -01: Document updated for compatibility with ROHC version 08 (reference [ROHC8]) -00: Document created for provision of new ROHC profiles in a protocol-independent manner Table of contents 1. Introduction.................................................3 2. Terminology..................................................4 3. Existing Solutions...........................................4 3.1. ROHC framework.............................................4 3.2. Location of EPIC in ROHC...................................4 3.3. ROHC profiles...............................................5 4. Header compression using EPIC................................5 4.1. Compression and decompression procedures...................5 4.2. Huffman compression and EPIC inputsets.....................7 5. EPIC Inputset................................................9 5.1. General parameters.........................................9 5.1.1. Lookup tables............................................9 5.1.2. Control of header alignment..............................9 5.1.3. Available memory.........................................10 5.2. Field-specific parameters..................................11 6. EPIC Encoding Toolbox........................................12 6.1. Toolbox members............................................13 6.1.1. STATIC...................................................13 6.1.2. IRREGULAR(k).............................................13 6.1.3. VALUE(v).................................................13 6.1.4. READ(LTABLE,i)...........................................13 6.1.5. WRITE(k,LTABLE,i)........................................14 6.1.6. LSB(k,p).................................................14 6.1.7. UNCOMPRESSED.............................................14 6.1.8. INFERRED.................................................14 6.2. Hints on applying the toolbox..............................15 6.2.1. Compressing variable-length fields.......................15 6.2.2. Compressing extension headers............................15 7. Control Fields...............................................17 7.1. CRC........................................................17 7.2. Master Sequence Number.....................................18 7.3. Miscellaneous Information..................................18 8. Extensibility................................................19 8.1. Other protocol stacks......................................19 8.2. New toolbox methods........................................19 8.3. New control fields.........................................20 8.4. Learning version of EPIC...................................20 9. EPIC Processing..............................................21 9.1. Structure of the Huffman tree..............................21 9.2. Compressor operation.......................................22 9.2.1. Step 1: Compression front end............................23 9.2.2. Step 2: Compressing the uncompressed header..............23 9.2.3. Step 3: Adding the UNCOMPRESSED fields...................23 9.3. Decompressor operation.....................................23 Price et al. [PAGE 2] INTERNET-DRAFT EPIC February 23, 2001 9.3.1. Step 1: Decompressing the compressed header..............23 9.3.2. Step 2: Reconstructing the INFERRED fields...............24 10. Security considerations.....................................24 11. Acknowledgements............................................24 12. Intellectual Property Considerations........................24 13. References..................................................25 14. Authors' Addresses..........................................25 Appendix A. EPIC Description in Pseudocode......................26 A.1. Structure of the Hierarchical Huffman Tree.................26 A.2. Compressor.................................................27 A.2.1. Step 1: Compression front end............................27 A.2.2. Step 2: Compressing the uncompressed header..............28 A.2.3. Step 3: Adding the UNCOMPRESSED fields...................29 A.3. Decompressor...............................................29 A.3.1. Step 1: Decompressing the compressed header..............29 A.3.2. Step 2: Decompression front end..........................30 A.3.3. Step 3: Reconstructing the INFERRED fields...............31 A.4. Hierarchical Huffman tree construction.....................32 A.4.1. Step 1: Process the EPIC inputset........................33 A.4.2. Step 2: Resolve complex encoding methods.................33 A.4.3. Step 3: Tree pruning.....................................35 A.4.4. Step 4: Creating the HH Tree.............................35 A.5. EPIC implementation issues.................................38 A.5.1. Arithmetic...............................................38 Appendix B. Example Profile for RTP/UDP/IPv4....................40 B.1. General parameters.........................................40 B.2. Field-specific parameters..................................40 B.3. Changes required for a random IP ID........................49 Appendix C: Proof of Optimal Efficiency..........................50 1. Introduction This document describes a method for generating new compressed header formats for [ROHC8]. The Efficient Protocol Independent Compression (EPIC) scheme takes as its input a list of fields in the protocol stack to be compressed, and for each field a choice of compression techniques to be made available for the field. Using this input EPIC derives a set of compressed header formats stored in the form of a tree. These compressed header formats can then be used to quickly compress and decompress headers. An important property of EPIC is that for any chosen protocol it generates a set of compressed headers for which the average header size is provably minimal. In particular, for the same level of robustness the compressed headers produced by EPIC are 5 - 10% more efficient than those currently used in [ROHC8]. A subset of EPIC has been implemented in accordance with this draft to demonstrate the operation of the algorithms, and this is available from [IMPL]. Chapter 3 gives an overview of [ROHC8] and explains how EPIC fits into the ROHC framework. Chapter 4 describes the steps involved in the EPIC scheme. Chapter 5 lists the inputset needed to create a new set of compressed header formats. Price et al. [PAGE 3] INTERNET-DRAFT EPIC February 23, 2001 Chapter 6 describes the eight basic compression techniques available in the EPIC toolbox. Chapter 7 discusses the EPIC control fields. Chapter 8 considers the extensibility of EPIC. Chapter 9 contains a step-by-step worked example of EPIC in action. Appendix A gives a normative description of EPIC in pseudocode. Appendix B lists an example EPIC inputset for RTP/UDP/IPv4. Appendix C states and proves the optimal efficiency claim for EPIC. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. 3. Existing Solutions This chapter describes the overall ROHC framework and the location of EPIC within [ROHC8]. 3.1. ROHC framework [ROHC8] defines a general compressed header format that is used for any compressed packet as well as feedback and IR (Initialization and Refresh) packets. The general header format includes a CID (Context Identifier) to indicate the packet stream to which the header belongs. It also includes a packet type indicator to specify whether the packet is a feedback, initialization or general compressed header, whether it is segmented, and whether it contains padding. The following packet type indicators are reserved in the overall [ROHC8] framework: 1110: Padding or Add-CID octet 11110: Feedback 11111000: IR-DYN packet 1111110: IR packet 1111111: Segment The remaining packet types depend on the protocol stack to be compressed. A ROHC profile is the specification of how to compress a certain kind of header stream over a certain kind of link. Each ROHC profile has its own set of compressed packet formats to complement the general packet types common to all ROHC profiles. Note that as well as defining a new set of compressed header formats, each ROHC profile also describes how the compressor and decompressor should be initialized and when to change state or mode. 3.2 Location of EPIC in ROHC The EPIC scheme generates new compressed header formats for use in ROHC profiles. Price et al. [PAGE 4] INTERNET-DRAFT EPIC February 23, 2001 The packet formats generated by EPIC are fully [ROHC8]-compatible. The location of the EPIC compressed header within the general ROHC packet is shown below: 0 7 --- --- --- --- --- --- --- --- | Add-CID octet | if for CID 1-15 and small CIDs +---+--- --- --- ---+--- --- ---+ | EPIC compressed header | 1 octet +---+--- ---+---+---+--- --- ---+ | | / 0, 1, or 2 octets of CID / 1 or 2 octets if large CIDs | | +---+---+---+---+---+---+---+---+ / EPIC compressed header / variable +---+---+---+---+---+---+---+---+ Figure 1 : Location of EPIC within the general ROHC packet In general the compressed header formats for a [ROHC8] profile are octet-aligned and do not start with the bit pattern 111XXXXX (since this is reserved by the overall ROHC framework). So to ensure [ROHC8] compatibility, the first octet of an EPIC compressed header never uses the bit pattern 111XXXXX. 3.3. ROHC profiles [ROHC8] currently contains four profiles: Profile 0 for sending uncompressed IP packets Profile 1 for RTP/UDP/IP compression Profile 2 for UDP/IP compression Profile 3 for ESP/IP compression The basic function of EPIC is to generate new profiles for [ROHC8]. Each profile includes a set of compressed header formats that can efficiently compress one protocol stack. Within each profile, the size of the CRC checksum, length of the sequence number and memory requirements for the profile can also be customized. 4. Header Compression using EPIC This chapter outlines the EPIC scheme. 4.1. Compression and decompression procedures Any ROHC-compatible header compression scheme can be considered to operate as follows: 1. For an input uncompressed header, extract the non-redundant information from the header fields using available context data as appropriate. 2. Format this information into a compressed header (as in Figure 1) for transfer to the decompressor. Price et al. [PAGE 5] INTERNET-DRAFT EPIC February 23, 2001 3. Use this information, along with context data at the decompressor, to reconstruct the original header. This section describes in overview how each stage is carried out using the EPIC scheme. Figure 2a illustrates the per-packet processing which is done for each header to be compressed and decompressed. (1) | ^ | {Uncompressed {Uncompressed | | header} header} | v | +-----------------+ | | Classify | | | packet | | +-----------------+ | | | | {Header & | | format} +-----------------+ v | Apply decoding | +-----------------+ | methods and | | Apply encoding | | reconstruct | | methods | | header | | (Section A.2.1) | | (Section A.3.2) | +-----------------+ +-----------------+ | ^ | {Format & {Format & | | M value} M value} | v | (2) +-----------------+ +-----------------+ | Tree | | Tree | | traversal | | traversal | | (Section A.2.2) | | (Section A.3.1) | +-----------------+ +-----------------+ | ^ | {Codewords} {Codewords} | v {Compressed | +-----------------+ header} (3) +-----------------+ | Header | | Header | | formatting & | ---------------> | reception & | | transmission | | parsing | +-----------------+ +-----------------+ Figure 2a : EPIC compression/decompression For (1) EPIC carries out two principal steps. The first is to choose a "compressed header format" for the uncompressed header. This format defines how the non-redundant information will be extracted from each field in the uncompressed header. Note that the set of available formats depends on the chosen profile, and optionally on other factors such as the compressor/decompressor mode. For the second step of (1) the information is extracted from each field in turn and the results combined together into a single bit Price et al. [PAGE 6] INTERNET-DRAFT EPIC February 23, 2001 pattern, which can be thought of as a single large integer known as the "M value". The process used to do this extraction for any given field is called the "encoding method" for that field. The set of fundamental encoding methods is called the "EPIC toolbox", and is essentially the same as those used for the profiles defined in [ROHC8]. The capabilities of EPIC are extensible simply by adding to this toolbox (all other parts of the process are unaffected); extensibility issues are considered further in Chapter 8. The next step (2) involves taking the format and M value and constructing the compressed header itself. The header formats are stored using a tree, with each leaf node corresponding to exactly one compressed header format. EPIC follows the tree from leaf to root, using the M value to fill gaps in the compressed header where field information is to be stored. At the receiver, step (3) essentially carries out equivalent operations in reverse. The decompressor has an identical tree, and can use the EPIC compressed header to navigate a path through the tree starting from the root. This results in a known compressed header format and reconstructed M value. Since the set of encoding methods used is now known (from the format) the M value can be resolved into the non-redundant information from every field, reconstructing each field in the uncompressed header. 4.2. Huffman compression and EPIC inputsets Huffman compression is a well known technique used in many popular compression schemes. EPIC uses a variant of Huffman encoding to produce a new set of compressed header formats for the compressor and decompressor. EPIC employs an optimization of the basic Huffman technique which reduces memory and processing requirements within the compressor/decompressor. Basic Huffman encoding is designed to compress an arbitrary stream of characters from a character set such as ASCII. The idea is to create a new set of compressed characters, where each normal character maps onto a compressed character and vice versa. Common characters are given shorter compressed equivalents than rarely used characters, reducing the average size of the data stream. The compression ratio can be improved by increasing the size of one character, but at the expense of higher memory requirements. In fact the memory required in a Huffman compressor grows exponentially with the character size so 16-bit characters need 256 times as much memory as 8-bit characters. The fundamental idea of EPIC is to apply Huffman encoding to a stream of packet headers, where each header is treated as a single character. The advantage of this choice is that Huffman encoding is optimally efficient for a single character, so applying Huffman to the whole header at once means that the average compressed header size is provably the minimum possible. The disadvantage of basic Huffman is that the character set is extremely large (for a 40 octet protocol stack the character set contains up to 2^320 headers). Clearly the basic Huffman technique cannot be used in this way. Price et al. [PAGE 7] INTERNET-DRAFT EPIC February 23, 2001 The solution lies in a modified version of Huffman encoding known as Hierarchical Huffman (HH). The HH algorithm compresses a data stream with exactly the same level of efficiency as normal Huffman, but for a fraction of the processing and memory cost. The input required to build a new set of compressed header formats is known as an "EPIC inputset". Each inputset is essentially a structured description of the behavior of the protocol stack to be compressed (along with some control information). EPIC converts an inputset into a suitable form for the HH algorithm, which then builds a unique HH tree containing the header formats. Inputsets have a recursive structure, which makes it easy to build new inputsets out of existing ones (e.g. when adding new layers to a protocol stack). In advanced applications of EPIC new inputsets can even be transmitted in a simple "profiling" message, which allows dynamic updating of the header compression capabilities of a device. Figure 2b describes the process of building the HH tree from the EPIC inputset, which is done once only and the results stored at the compressor and decompressor. References are given to pseudocode in Appendix A which describes the various stages explicitly. +-----------------------+ | Input Stage 1: | | Enter the inputset | | which describes | | the protocol stack | | (Section A.4.1) | +-----------------------+ | v +-----------------------+ | Input Stage 2: | | Resolve complex | | encoding methods into | | fundamental methods | | (Section A.4.2) | +-----------------------+ | v +-----------------------+ | Input Stage 3: | | Prune low probability | | header formats | | (Section A.4.3) | +-----------------------+ | v +-----------------------+ | Input Stage 4: | | Run HH algorithm to | | generate tree | | (Section A.4.4) | +-----------------------+ Figure 2b : HH Tree Construction Price et al. [PAGE 8] INTERNET-DRAFT EPIC February 23, 2001 The following chapters describe the mechanisms of EPIC in greater detail. 5. EPIC Inputset This chapter describes the inputset that EPIC requires to create a new set of compressed header formats. Section 5.1 describes the general inputset parameters common to the entire protocol stack, whilst Section 5.2 lists the inputset parameters specific to each field. 5.1. General inputset parameters The general parameters are used to configure the lookup tables, the alignment of the compressed headers and the amount of memory reserved for EPIC at the compressor and decompressor. 5.1.1. Lookup tables One of the compression techniques offered by EPIC is table-based item compression as described in Section 5.8.1 of [ROHC8]. This compression method allows an item of data to be transmitted in full together with an index. The item is stored in a lookup table at the position specified by index, and once the compressor gains enough confidence that the decompressor has successfully received the mapping it can send the item by transmitting its index only. EPIC allows multiple lookup tables to be set up, where each lookup table is capable of storing a fixed number of items. Note that smaller lookup tables achieve better compression efficiency since fewer bits are required to index an item in the table. Each lookup table is given a unique name by which it can be referenced. Lookup tables are initialized as shown in the example below: LTABLE{"0000","0101","110111"} The name of the lookup table is given, followed by a list of bit patterns that represent the initial entries in the lookup table. Note that the one lookup table may sometimes store bit patterns with several different lengths (for example when the lookup table is used to store values from a variable-length field). Lookup tables are accessed and updated using a pair of encoding techniques as described in Chapter 6. 5.1.2. Control of header alignment The alignment of the compressed headers is controlled using the ALIGN and NPATTERNS parameters. Each of the parameters is an integer. All of the compressed headers produced by EPIC are guaranteed to be an integer multiple of ALIGN bits long. Moreover, the first word of Price et al. [PAGE 9] INTERNET-DRAFT EPIC February 23, 2001 each compressed header (where one word is ALIGN bits) uses only NPATTERNS of the 2^ALIGN different bit patterns available. For example, suppose that ALIGN = 8 (the octet-aligned case) and NPATTERNS = 224. In the first octet of every compressed header only 224 of the available 256 bit patterns are used by EPIC, namely the bit patterns 00000000, 00000001 through to 11011111 inclusive. The bit patterns which are not used are 11100000 through to 11111111. In fact, it is important for EPIC not to use the bit patterns 111XXXXX in the first octet of each compressed header because they are reserved by the ROHC framework. The correct parameters for EPIC to produce [ROHC8]-compatible compressed header formats are ALIGN = 8 and NPATTERNS = 224. Some useful values for ALIGN and NPATTERNS are listed below: Compressed Headers ALIGN NPATTERNS Bit-aligned 1 2 Octet-aligned 8 256 Octet-aligned with 8 224 111XXXXX reserved 5.1.3. Available memory The MAXFORMATS and PTHRESH parameters control the number of compressed header formats to be stored at the compressor and decompressor. The MAXFORMATS parameter determines the maximum number of compressed header formats stored by EPIC. If more compressed header formats are generated than can be stored then EPIC discards all but the MAXFORMATS most likely formats to be used. Any packet requiring a compressed header format that has been discarded is communicated using a DYNAMIC packet instead. Similarly, the PTHRESH parameter specifies a probability threshold for the compressed header formats. If any format is used with less that PTHRESH probability then it is discarded to conserve memory at the compressor and decompressor. A DYNAMIC packet can be used to communicate the information instead. The MAXFORMATS and PTHRESH parameters affect the EPIC compressed header formats, and so for interoperability they MUST be specified as part of the EPIC inputset. Note that one compressed header format requires approximately 2 octets of storage space at the compressor and decompressor. Since the header formats are only calculated once, it is possible to store them in read only memory. Note also that EPIC requires additional working memory to compress and decompress the headers, to store the lookup tables and so on. Price et al. [PAGE 10] INTERNET-DRAFT EPIC February 23, 2001 However, EPIC places no extra working memory requirements on the compressor and decompressor compared to [ROHC8]. 5.2. Field-specific parameters EPIC contains a toolbox of eight basic compression techniques (LSB compression, lookup table compression etc.). The field-specific parameters indicate which encoding methods from the toolbox should be applied to which fields in the protocol stack. The basic plan is to build up more complex encoding methods from the set of eight techniques in the encoding toolbox (Chapter 6). This allows us to construct an encoding method sufficiently powerful to handle the entire protocol stack. A new encoding method is described by means of a table as shown below (N.B. the normative version of IPv4 encoding is found in Appendix B): IPv4-ENCODING: TTL{"",""} +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Version | STATIC | 100% | | Header Length | | | | Reserved Flag | | | | May Fragment Flag | | | | Last Fragment Flag | | | | Fragment Offset | | | | Source Address | | | | Destination Address | | | +------------------------+----------------------------+-------------+ | Packet Length | INFERRED | 100% | | Header Checksum | | | | Protocol | | | +------------------------+----------------------------+-------------+ | Type of Service | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(8) | 0.1% | +------------------------+----------------------------+-------------+ | Identification | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(16) | 0.1% | +------------------------+----------------------------+-------------+ | Time to Live | STATIC | 99% | | +----------------------------+-------------+ | | READ(TTL,2) | 0.9% | | +----------------------------+-------------+ | | WRITE(8,TTL,2) | 0.1% | +------------------------+----------------------------+-------------+ IPv4-ENCODING is the name of the new encoding method. TTL{"",""} is a lookup table containing two entries, which is used to store the two most recent values of the Time to Live field. This lookup table is Price et al. [PAGE 11] INTERNET-DRAFT EPIC February 23, 2001 useful because the TTL field sometimes alternates between two different values (this is possible, for example, in the case of load sharing over paths of unequal hop counts). The Field Name(s) column contains a list of field names from the protocol stack to be compressed (in this case an IPv4 header). Notice that there is no length indication for any of the fields. This is because the length of the field can be inferred from the field name. EPIC can also cope with variable-length fields (as described in Section 6.2.1). For each field (or set of fields) the table includes a list of possible encoding methods. The encoding methods can be chosen from the set of eight techniques in the encoding toolbox of Chapter 6 (in the example the STATIC, IRREGULAR, LSB, READ, WRITE and INFERRED encoding methods are all from the toolbox). Alternatively, the encoding methods can be defined by their own tables as shown below: RTP/UDP/IPv4-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | CRC | IRREGULAR(3) | 100% | +------------------------+----------------------------+-------------+ | Master Sequence Number | LSB(4,-1) | 100% | +------------------------+----------------------------+-------------+ | IPv4 Header | IPv4-ENCODING | 100% | +------------------------+----------------------------+-------------+ | UDP Header | UDP-ENCODING | 100% | +------------------------+----------------------------+-------------+ | RTP Header | RTP-ENCODING | 100% | +------------------------+----------------------------+-------------+ The RTP/UDP/IPv4 encoding method uses the IPv4 encoding method defined previously. In this manner it is possible to build some powerful and flexible encoding methods. The final column lists the probability that each encoding method will be used to encode the field in question. This is a very important column since EPIC ensures that common encoding methods require fewer bits to carry the compressed data than rarely used encoding methods. Note that the order in which fields are given and the order of the encoding types for a particular field is significant in the following sense: - changing the order does not impact on the compression/decompression performance of the profile; however - changing the order does modify the format and content of the compressed headers themselves. Therefore, for interoperability, both the compressor and decompressor MUST have a common set of input tables including all ordering considerations. Price et al. [PAGE 12] INTERNET-DRAFT EPIC February 23, 2001 6. EPIC Encoding Toolbox ROHC can encode header fields using a number of different encoding techniques (LSB encoding, scaled timestamp encoding, list-based compression etc.). The EPIC encoding toolbox supports all of the techniques currently required by [ROHC8]. Different encoding methods can be assigned to different fields using a simple table of inputs as described in Section 5.2. 6.1. Toolbox members The EPIC encoding toolbox contains the following techniques: 6.1.1. STATIC The STATIC encoding method can be used when the header field does not change relative to the context. If a field is STATIC then no information concerning the field need be transmitted in the compressed header. The STATIC command has no parameters. 6.1.2. IRREGULAR(k) The IRREGULAR(k) encoding method is used when the field cannot be compressed relative to the context, and hence must be transmitted in full. The parameter k is the (uncompressed) field length in bits. Note that it is possible to use the IRREGULAR encoding method to compress variable-length fields. This is accomplished by specifying a separate IRREGULAR(k) encoding method for every common length of the variable-length field. Further information can be found in Section 6.2.1. 6.1.3. VALUE(v) The VALUE(v) encoding method has a single parameter v, which is a field value represented as a bit pattern. The VALUE(v) encoding can be used to transmit any field which takes value v. Note that VALUE(v) is just a simplified form of lookup table compression. It is useful for compressing fields which take a well- known value with high probability. 6.1.4. READ(LTABLE,i) The READ(LTABLE,i) encoding method retrieves the field value from the lookup table LTABLE. Recall that EPIC maintains a number of distinct lookup tables for storing and retrieving common field values. If the value of a field is stored in the lookup table LTABLE then it can be efficiently compressed by sending its position in the lookup table only. Price et al. [PAGE 13] INTERNET-DRAFT EPIC February 23, 2001 The parameter LTABLE is the name of the lookup table. The parameter i is the number of items which the lookup table can store. Common field values can be stored in the lookup table in two ways. Firstly they might be stored in the table at initialization of the compressor/decompressor. Secondly they can be updated by using the WRITE encoding method. If the value of the field is not present in any lookup table then the READ encoding method cannot be used to compress the field. 6.1.5. WRITE(k,LTABLE,i) The WRITE encoding method sends the field in full and also stores it in the lookup table LTABLE. Subsequently, whenever the field value turns up it can be retrieved from the lookup table using the READ(LTABLE,i) encoding method (provided of course that the WRITE method has been used sufficient times to ensure robustness). The WRITE command has the following 3 parameters: k: Uncompressed field length in bits LTABLE: Name of lookup table i: Number of values which the lookup table can store 6.1.6. LSB(k,p) The LSB(k,p) method compresses the field using LSB encoding. LSB encoding is described in Section 4.5.1 of [ROHC8]. The meaning of the two parameters k and p is described in [ROHC8], but basically the parameters have the following functions: k: Number of LSBs to transmit in the compressed header p: Offset of interpretation interval 6.1.7. UNCOMPRESSED The UNCOMPRESSED encoding method is used to send the field value in full at the end of the compressed header. Note that all UNCOMPRESSED fields are sent concatenated together at the end of the compressed header. This means that the decompressor needs to know the length of each UNCOMPRESSED field in order to reconstruct the uncompressed header. The way in which the field length is inferred is specific to each UNCOMPRESSED field, and is described separately from the table of the EPIC inputset. 6.1.8. INFERRED The INFERRED method instructs EPIC to infer the value of the field from other field values and/or the context. Price et al. [PAGE 14] INTERNET-DRAFT EPIC February 23, 2001 The INFERRED method has no parameters. The precise method by which INFERRED fields are reconstructed is specific to each field, and is described separately from the table of the EPIC inputset. 6.2. Hints on applying the toolbox This section gives some hints on how the EPIC toolbox can be applied to compress different types of field. 6.2.1. Compressing variable-length fields EPIC can compress variable-length fields efficiently by allocating a separate encoding method for each possible length of the field. This takes into account the probability distribution of the field length, so more common field sizes can be compressed with better efficiency. This technique can be used to compress the GRE Key field as follows: GRE-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | ... | ... | ... | +------------------------+----------------------------+-------------+ | GRE Key Present | INFERRED | 100% | +------------------------+----------------------------+-------------+ | GRE Key | STATIC | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 0.5% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.5% | +------------------------+----------------------------+-------------+ | ... | ... | ... | +------------------------+----------------------------+-------------+ The GRE Key field has two possible lengths (0 octets or 4 octets). If the key is not static then it can be sent in full with either of these two lengths. The GRE Key Present field is inferred from the size of the GRE Key. Note that if the field has many different lengths, for simplicity only the most common field lengths need be sent using the IRREGULAR(k) encoding. Fields with less common lengths can be sent UNCOMPRESSED. 6.2.2. Compressing extension headers Protocols such as IP have a number of optional extensions that follow the basic header. In [ROHC8] these extensions are handled by defining a new list-based compression type. In EPIC however, the extension headers can be handled using the basic toolbox encoding methods with no special enhancements. Price et al. [PAGE 15] INTERNET-DRAFT EPIC February 23, 2001 An example encoding method for the AH header is given below: RTP/UDP/IPv4-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | ... | ... | ... | +------------------------+----------------------------+-------------+ | Authentication Header | AH-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | ... | ... | ... | +------------------------+----------------------------+-------------+ N.B. The term "Authentication Header" in the Field Name(s) column is just shorthand for the set of fields in the authentication header. AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"} AH-SPI{"","","","",""} +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | AH Next Header | INFERRED | 100% | | AH Reserved | | | +------------------------+----------------------------+-------------+ | AH Length | STATIC | 99.9% | | +----------------------------+-------------+ | | READ(COMMONLENGTHS,5) | 0.09% | | +----------------------------+-------------+ | | WRITE(8,COMMONLENGTHS,5) | 0.01% | +------------------------+----------------------------+-------------+ | AH SPI | STATIC | 99.9% | | +----------------------------+-------------+ | | READ(AH-SPI,5) | 0.099% | | +----------------------------+-------------+ | | WRITE(32,AH-SPI,5) | 0.001% | +------------------------+----------------------------+-------------+ | AH Sequence Number | INFERRED | 99% | | +----------------------------+-------------+ | | LSB(8,-1) | 0.9% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.1% | +------------------------+----------------------------+-------------+ | AH Authentication Data | UNCOMPRESSED | 100% | +------------------------+----------------------------+-------------+ The AH header is encoded in the same manner as any other header in the protocol stack. The only difference is that the entire AH header may not be present in the stack, so an additional IRREGULAR(0) encoding is required for the whole AH header. This encoding is used Price et al. [PAGE 16] INTERNET-DRAFT EPIC February 23, 2001 to delete every field in the AH header when it is no longer present in the uncompressed packet. Note that IP extension headers do not necessarily occur in the same order for every packet stream. This is not a problem for EPIC because each field in the protocol stack is transmitted separately (and not necessarily in order). The individual fields are reconstructed at the decompressor and then placed in the uncompressed header at their correct locations. If the order of any fields changes then EPIC transmits the fields uncompressed. When this information has been sent sufficient times to ensure robustness, the decompressor knows the new ordering of the fields and can subsequently reconstruct the uncompressed headers using the new field ordering as appropriate. 7. Control fields In general the fields specified in the EPIC input tables are taken directly from the protocol stack to be compressed. However there are a small number of special fields that can be transmitted in the compressed headers, but which are not present in the uncompressed packets. These fields are known as control fields because they transmit control information from the compressor to the decompressor. The most obvious example of a control field is the header compression CRC. This CRC is calculated over the uncompressed header, and is used to detect incorrect decompression at the decompressor. EPIC treats the control fields in the same manner as any normal field. The only difference is that at the compressor the control field values are calculated rather than obtained directly from the uncompressed packet. Similarly the control field values are used by the decompressor itself (e.g. to check the header for errors) and are not forwarded in the decompressed packet. EPIC supports the following control fields: 7.1. CRC The CRC field holds a CRC checksum calculated across the entire uncompressed header, as described in Section 5.9.2 of [ROHC8]. At the decompressor the CRC field is used to validate that correct decompression has occurred. Note that it is possible for different header formats to have different amounts of CRC protection, so extra CRC bits can be allocated to protect important context-updating information. This is accomplished by creating a new encoding method for the fields to be given extra CRC protection as shown in the example below: Price et al. [PAGE 17] INTERNET-DRAFT EPIC February 23, 2001 EXAMPLE-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Important Field | NORMAL-CRC | 50% | | CRC +----------------------------+-------------+ | | IRREGULAR(8) | 50% | +------------------------+----------------------------+-------------+ NORMAL-CRC: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Important Field | STATIC | 100% | +------------------------+----------------------------+-------------+ | CRC | IRREGULAR(3) | 100% | +------------------------+----------------------------+-------------+ Observe that the compressed header has 3 bits of CRC protection if the important field is STATIC. However, if the field is IRREGULAR(8) then the compressed header has 8 bits of CRC protection. This is useful since the IRREGULAR behavior is context-updating. 7.2. Master Sequence Number The RTP/UDP/IP protocol stack contains many different sequence numbers (the RTP sequence number, the ESP sequence number, the AH sequence number and so on). One sequence number must be transmitted to protect against bursts of consecutive lost packets, but it is then possible to infer all other sequence numbers by linear extrapolation (i.e. by adding or subtracting a constant integer). Unfortunately, no sequence number field is guaranteed to be present in every IP protocol stack. The RTP sequence number is not visible if the IP payload is encrypted, whilst the ESP and AH sequence numbers are only included if the optional ESP and AH headers are present. To overcome this problem EPIC defines a control field called the Master Sequence Number field. This field is present in every compressed header and can be used to infer all other sequence numbers. The size of the Master Sequence Number field is chosen for the best trade-off between robustness and efficiency. A Master Sequence Number of k bits prevent context failure for up to 2^k - 1 consecutive lost packets. 7.3. Miscellaneous Information The Miscellaneous Information fields transmit any extra information that EPIC needs to reconstruct the UNCOMPRESSED and INFERRED fields, but which is not already present in the compressed header. Price et al. [PAGE 18] INTERNET-DRAFT EPIC February 23, 2001 Some examples of Miscellaneous Information fields used in RTP/UDP/IP compression are given below: TS: The Least Significant Bits (LSBs) of the RTP Timestamp, scaled if Tsc = 1. TS STRIDE: The predicted increment/decrement of the RTP Timestamp field when it changes. The use for this field is explained in Section 4.5.4 of [ROHC8]. TIME STRIDE: The predicted time interval (in milliseconds) between changes in the RTP Timestamp. The use for this field is explained in Section 4.5.4 of [ROHC8]. Tsc: Indicates whether the timestamp is scaled or not. All of these fields are needed to infer the RTP timestamp. Note that Miscellaneous Information fields are control fields because they discarded once they have been used to infer the relevant UNCOMPRESSED or INFERRED fields. 8. Extensibility This chapter considers the extensibility of the EPIC scheme, including possible future enhancements to EPIC. 8.1. Other protocol stacks Especially for multimedia over IP, additional protocols will become important in future, e.g. tunneling protocols for virtual private networks and unreliable STCP for multimedia. They will generate further protocol overhead for transmission and new requirements towards efficient header compression. EPIC requires minimal effort to generate a new ROHC profile. The input to EPIC is the behavior of the new protocol stack (i.e. which fields require which encoding techniques). EPIC then produces an optimally efficient set of compressed header formats for use at the compressor and decompressor. 8.2. New toolbox methods The eight encoding techniques currently offered by the EPIC toolbox are sufficient to compress the entire RTP/UDP/IP protocol stack including the CSRC list and IP extension headers. However, if EPIC is used to compress other protocol stacks then it may become necessary to add new encoding methods to the toolbox. In general, a new toolbox encoding method is just a mapping between a set of uncompressed field values and their compressed equivalents. The only requirement is that the mapping is 1-1 (in other words that no two field values map onto the same compressed value). It is acceptable for the mapping to change depending on the context, although care must be taken in this case to ensure that it is Price et al. [PAGE 19] INTERNET-DRAFT EPIC February 23, 2001 robust. For example, a DELTA encoding method that stores the increase in the field value relative to the context is very efficient for compressing incrementing fields. However it is not robust to lost packets (since it fails if the decompressor context is incorrect). 8.3. New control fields It may also be necessary to extend EPIC by adding extra control fields. Recall that control fields communicate information between the compressor and decompressor that is not forwarded in the uncompressed header. Since control fields are treated in exactly the same manner as normal header fields (except that they are not present in the uncompressed header), the only change required is a description of how the control field is calculated at the compressor and how it is used at the decompressor. 8.4. Learning version of EPIC As stated in the introduction, the EPIC compressed header formats are optimal for the chosen protocol stack. This means that it is provably impossible to compress any protocol stack for which EPIC is programmed in a more efficient manner than EPIC (for the chosen header alignment and level of robustness). An interesting question, however, is the effectiveness of EPIC when compressing a protocol stack for which it is not programmed. If the correct encoding methods have been assigned to each field but the probabilities that they will be used are slightly inaccurate then the efficiency lost is negligible. But suppose that the protocol stack behaves in a significantly different manner than expected by ROHC: for example if an RTP stream uses padding or even RTP extensions. Static compressors with fixed compressed header formats suffer a permanent drop in performance if the protocol stack behaves in an unusual manner. However, since EPIC can dynamically generate new packet formats as they are needed, it is possible for an EPIC-generated ROHC profile to adapt its own header formats to the incoming packet stream. A basic version of "Learning" EPIC is straightforward to implement. The encoding methods assigned to each field are not changed; instead the probabilities that each encoding method will be used are refined by counting the number of times they have been used by recent packets. In this manner, if a particular compressed header format is used more often than expected then the Huffman tree will be refined to encode the format using fewer bits. Fast algorithms exist for updating a Huffman tree when only a small number of inputs have changed. Care must be taken to ensure that the Huffman trees at the compressor and decompressor are updated in step; this can be accomplished by updating the tree periodically at the compressor and sending the refined version to the decompressor within a special "profiling" message. Price et al. [PAGE 20] INTERNET-DRAFT EPIC February 23, 2001 9. EPIC Processing This chapter presents an outline description of the EPIC processing without any algorithmic or computational notation. The detailed pseudocode description of EPIC is presented in Appendix A. There are four main aspects to the pseudocode: Structure of the Huffman tree Compressor operation Decompressor operation Tree construction 9.1. Structure of the Huffman Tree Recall that EPIC uses a Hierarchical Huffman (HH) tree to succinctly store a set of compressed header formats. A diagram of an HH tree can be found in Figure 3. The HH tree consists of a set of nodes joined by branches. Conventionally the nodes are arranged vertically, with leaf nodes at the top and the root node at the bottom. There is a single root node, which is the starting point for decompression operations. There is a set of leaf nodes, which are used as the starting points for compression operations. Each compressed header format is associated with a specific leaf node and vice versa. All other nodes are interior nodes. The information needed for compression and decompression is contained in the tree topology (i.e. the connectivity of nodes and branches) and the labels on each branch. Information is also associated with each node whilst the tree is being constructed, but this can be discarded once the tree has been built. Note that this "tree" is multiply connected, i.e. there are several alternative paths between any given leaf node and the root. The correct route to follow is determined by the compression and decompression algorithms. An example of a bit-aligned HH tree is given in Figure 3. The example considers a very simple protocol stack with two fields labeled A and B. Four compressed header formats are available to compress this protocol stack. Each header format is placed in a leaf node of the Huffman tree, together with the probability "P" that the format will be used to compress a header and the number "N" of distinct headers that can be compressed using the format. The procedure for generating the tree itself is described in Appendix A.4. Price et al. [PAGE 21] INTERNET-DRAFT EPIC February 23, 2001 Leaf 1 Leaf 2 Leaf 3 Leaf 4 A=STATIC, A=STATIC, A=IRREGULAR(1) A=STATIC, B=IRREGULAR(2) B=READ(LTABLE,3) B=STATIC B=STATIC --- --- --- --- | 3%| | 8%| | 5%| |54%| | 4 | | 3 | | 2 | | 1 | --- --- --- --- | X / D \ | X / 0 | / \ | / | / \ | / --- --- --- --- / | 6%| | 8%| | 8%| |10%| / | 2 | | 2 | | 1 | | 1 | / --- --- --- --- / | X | X 1 \ / 0 / | | \ / / | | \ / / --- --- --- / |12%| |16%| |18%| / | 1 | | 1 | | 1 | / --- --- --- / 1 \ / 0 / 0 / \ / / / \ / / / --- / / |28%| / / | 1 | / / --- / / 1 \ / / \ / / \ / / \ / / \ / / --- / |46%| / | 1 | / --- / 1 \ / \ / \ / ---- |100%| | 1 | ---- Figure 3 : An example of a Hierarchical Huffman tree 9.2. Compressor operation This section describes the steps of operation within the EPIC header compressor. This will be done in 3 steps: the selection of the Price et al. [PAGE 22] INTERNET-DRAFT EPIC February 23, 2001 relevant HH tree and leaf node, the compression itself, and the addition of the uncompressed fields. 9.2.1. Step 1: Compression front end The input to the EPIC compressor is a header from the protocol stack to be compressed. The compressor maintains a number of contexts as described in Section 5.1.3 of [ROHC8], and the correct context to use should be selected depending on the uncompressed header. The uncompressed header should then be divided up into fields. As per [ROHC8] Section 5.7, the term hdr(X) will refer to the value of Field X in the uncompressed header. Next the compressor chooses an encoding method from the EPIC toolbox for each field in the protocol stack (for example the IP-ID might be encoded using the "IRREGULAR" function etc.) from the set of possibilities defined by the inputset. The choice of encoding methods can then be mapped onto a compressed header format. There is one compressed header format for every possible way the fields can be encoded (unless the number of formats are limited due to memory constraints, in which case a DYNAMIC packet can be sent instead). In general the only requirement is that the format must be capable of transmitting enough information to reconstruct the header at the decompressor. If more than one suitable compressed header format is available then the compressor may choose any of these. 9.2.2. Step 2: Compressing the uncompressed header The compression process works down the Huffman tree, making decisions at each node based on the choice of encoding method for each field. The process finishes when there are no more branches to follow. The result is a string of integers (known as codewords) that each contain ALIGN bits of the compressed header. 9.2.3. Step 3: Adding the UNCOMPRESSED fields The third step is to add the uncompressed fields to the end of the compressed header. The final task is to encapsulate the EPIC compressed header within a standard ROHC packet as illustrated in Figure 1. The packet with compressed header is then ready to be transmitted. 9.3. Decompressor operation This section describes the steps of operation within the EPIC header decompressor. 9.3.1. Step 1: Decompressing the compressed header Each codeword is extracted from the received compressed header. Starting at the root of the Huffman tree, branches are followed up Price et al. [PAGE 23] INTERNET-DRAFT EPIC February 23, 2001 the tree and decisions are taken at each node based on the current codeword. When the top of the tree is reached, the leaf node indicates which compressed header format has been used. This in turn gives the encoding method that has been used for every field in the header. These values are then used to find the format of the original field, with which reconstruction of the original field value can occur. This process continues until all codewords have been dealt with. 9.3.2. Step 2: Reconstructing the INFERRED fields The final step at the decompressor is to reconstruct the UNCOMPRESSED and INFERRED fields from the transmitted fields and/or the context. In general the reconstruction process is specific to the individual fields. The normative pseudocode description of EPIC is given in Appendix A. 10. Security Considerations EPIC generates compressed header formats for direct use in ROHC profiles. Consequently the security considerations for EPIC match those of [ROHC8]. 11. Acknowledgements Header compression schemes from [ROHC8] have been important sources of ideas and knowledge. Basic Huffman encoding [HUFF] was enhanced for the specific tasks of robust, efficient header compression. Thanks to Carsten Bormann (cabo@tzi.org) Max Riegel (maximilian.riegel@icn.siemens.de) for valuable input and review. 12. Intellectual Property Considerations This proposal in is full conformity with [RFC-2026]. Siemens may have patent rights on technology described in this document which employees of Siemens contribute for use in IETF standards discussions. In relation to any IETF standard incorporating any such technology, Siemens hereby agrees to license on fair, reasonable and non-discriminatory terms, based on reciprocity, any patent claims it owns covering such technology, to the extent such technology is essential to comply with such standard. Price et al. [PAGE 24] INTERNET-DRAFT EPIC February 23, 2001 13. References [ROHC8] "RObust Header Compression (ROHC)", Carsten Bormann et al, , Internet Engineering Task Force, February 7, 2001 [HUFF] "The Data Compression Book", Mark Nelson and Jean-Loup Gailly, M&T Books, 1995 [IMPL] http://www.roke.co.uk/networks/epic/epic.html [RFC-2026] "The Internet Standards Process - Revision 3", Scott Bradner, Internet Engineering Task Force, October 1996 [RFC-2119] "Key words for use in RFCs to Indicate Requirement Levels", Scott Bradner, Internet Engineering Task Force, March 1997 14. Authors' Addresses Richard Price Tel: +44 1794 833681 Email: richard.price@roke.co.uk Robert Hancock Tel: +44 1794 833601 Email: robert.hancock@roke.co.uk Stephen McCann Tel: +44 1794 833341 Email: stephen.mccann@roke.co.uk Mark A West Tel: +44 1794 833311 Email: mark.a.west@roke.co.uk Abigail Surtees Tel: +44 1794 833131 Email: abigail.surtees@roke.co.uk Paul Ollis Tel: +44 1794 833168 Email: paul.ollis@roke.co.uk Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom Christian Schmidt Tel: +49 89 722 27822 Email: christian.schmidt@icn.siemens.de Siemens ICM N MR ST1 Munich Germany Price et al. [PAGE 25] INTERNET-DRAFT EPIC February 23, 2001 Appendix A. EPIC Description in Pseudocode Chapter 9 presented an outline description of the operation of the EPIC scheme. This appendix goes on to describe the complete EPIC scheme in pseudocode. As stated in Section 9, there are four main aspects to EPIC: Description of the structure of the Hierarchical Huffman (HH) tree Compressor operation Decompressor operation Tree construction The final section of the appendix describes certain implementation issues. A.1. Structure of the Hierarchical Huffman tree As mentioned in Section 9, the HH tree consists of a set of nodes joined by branches. Conventionally, the nodes are arranged vertically with leaf nodes at the top and the root node at the bottom. There is a single root node, which is the starting point for decompression operations. There is a set of leaf nodes, which are used as the starting points for compression operations. Each possible compressed header format is associated with a specific leaf node. All other nodes are interior nodes. The format indicator for a leaf node is in fact just a list of encoding methods, one for each field in the header. In what follows, we notate the encoding for a single field as "F=E", for a field called "F" using encoding method "E". The complete format indicator will then look like {F1=E1, F2=E2, ...}. Although we start off in the inputset with encoding methods expressed recursively, by the time the tree has been constructed all the Ei are actually toolbox encoding methods. No special information is associated with any of the nodes. (Information is associated with each node while the tree is being constructed, but this can all be discarded once the construction is complete.) All the information is contained in the tree topology (the connectivity of nodes and branches) and labels on each branch. There are three possible cases for branches between nodes at different levels: i) A single vertical branch between a pair of nodes. Such a branch is labeled {X}. ii) A pair of branches joining one upper node to two lower ones. These branches are labeled {D t} for some integer t (different for each branch). iii) Several branches each joining one upper node to a common lower node. Each branch is labeled with {t} for some integer t (different for each branch). Price et al. [PAGE 26] INTERNET-DRAFT EPIC February 23, 2001 Note that this "tree" can actually be multiply connected, i.e. there may be several alternative paths between any given leaf node and the root. The correct route to follow is determined by the compression and decompression algorithms. As well as the tree itself, the following input parameters are needed to begin the compression and decompression process: ALIGN: Number of bits for alignment (i.e. size of one codeword in the compressed header). NPATTERNS: Number of bit patterns available for use by EPIC in the first codeword of the compressed header. A.2. Compressor This section describes the EPIC header compressor. A.2.1. Step 1: Compression front end The input to the EPIC compressor is a header from the protocol stack to be compressed. The compressor maintains a number of contexts as described in Section 5.1.3 of [ROHC8], and the correct context to use should be selected depending on the uncompressed header. The uncompressed header should then be divided up into fields. As per [ROHC8] Section 5.7, the term hdr(X) will refer to the value of Field X in the uncompressed header. The compressor chooses a compressed header format from the possible set (where each compressed header format corresponds to a different leaf node in the HH tree). In general the format must be capable of transmitting enough information to reconstruct the header at the decompressor. If more than one suitable format is available then the compressor may choose any of these. The compressor then calculates a non-negative integer M that contains all of the non-redundant information from the uncompressed header. M is calculated using the format indicator for the chosen leaf node. The following steps are taken: 1 Set M=0 2 From the left, search the format indicator for the next instance of "READ" or "WRITE" encoding methods. while (A new instance of "READ" or "WRITE" can be found) { i) Let F denote the corresponding field and let i denote the corresponding parameter (so the format indicator contains "F=READ(LTABLE,i)" or "F=WRITE(k,LTABLE,i)" for some k,LTABLE). ii) Let M=M*i iii) Let M=M+index where index is an integer from 0 to i-1. It denotes the position of hdr(F) in the Price et al. [PAGE 27] INTERNET-DRAFT EPIC February 23, 2001 lookup table and is fixed for READ but chosen by the compressor for WRITE. } 3 From the left, search the format indicator for the next instance of "LSB", "IRREGULAR" or "WRITE". while (A new instance of "LSB", "IRREGULAR" or "WRITE" encoding methods can be found) { i) Let F denote the corresponding field and let k denote the corresponding parameter (so the format indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k) or "F=WRITE(k,LTABLE,i)"). ii) Let M=M*2^k iii) Let M=M+value, where value is from 0 to 2^k - 1. For IRREGULAR and WRITE encoding it is hdr(F) in integer form. For LSB encoding it is the LSB of hdr(F) as described in Section 4.5.1 of [ROHC8]. } The end result is the M value for the header, along with the identified starting leaf node. A.2.2. Step 2: Compressing the uncompressed header Working from the M value and start node, the encoding process creates a sequence {codeword(i)} of codewords as follows: 1 Begin at the start node in the tree corresponding to the header format. Set i=1. 2 Follow the branches of the tree downwards towards the root, taking the following steps at each node: a) If the next branch label is {X} then: i) codeword(i) = M modulo 2^ALIGN, i=i+1 ii) set M = integer part (M / 2^ALIGN) b) If there are 2 branches with labels {D N1}, {D N2} then: i) d = absolute difference between N1 and N2 ii) if M < d then follow branch to left iii) else set M = M - d and follow right branch c) If the next branch label is a number {L} then: i) codeword(i) = L + M, i=i+1 ii) set M = 0 Price et al. [PAGE 28] INTERNET-DRAFT EPIC February 23, 2001 3 Finish when there are no more branches to follow, set NCODEWORDS=i (i.e. NCODEWORDS is the total number of codewords in the compressed header). 4 Reverse the list of codewords, so codeword(1) is the last codeword produced, and codeword(NCODEWORDS) is the first. This produces a string of codeword numbers, where each codeword represents ALIGN bits in the EPIC compressed header. For the most likely permutations this string could contain a single codeword. It is a property of EPIC that the following hold: For i=1: 0 <= codeword(i) < NPATTERNS For i=2, ..., NCODEWORDS: 0 <= codeword(i) < 2^ALIGN This means that the compressed header can be very simply transmitted with implicit codeword alignment. The values to be placed in the first codeword are limited to be numerically less than NPATTERNS, which means that part of this codeword can be used for other purposes. A.2.3. Step 3: Adding the UNCOMPRESSED fields The last step is to add the UNCOMPRESSED fields to the end of the compressed header. This is accomplished as follows: 1 From the left, search the format indicator for the next instance of "UNCOMPRESSED". while (A new instance of "UNCOMPRESSED" can be found) { i) Let F denote the corresponding field. ii) Add hdr(F) to the compressed header. } If the compressed header is no longer a multiple of ALIGN bits then it should be padded by adding "0" bits onto the end. The final task is to encapsulate the EPIC compressed header within a standard ROHC packet as described in Section 3.2. The packet with compressed header is then ready to be transmitted. A.3. Decompressor This section describes the EPIC header decompressor. A.3.1. Step 1: Decompressing the compressed header The list C=(codeword(1), ... codeword(NCODEWORDS)) can be read directly from the compressed header. Note that the integer NCODEWORDS (i.e. the length of the compressed header excluding UNCOMPRESSED fields) is not yet known, but instead will be determined when the compressed header is parsed. Price et al. [PAGE 29] INTERNET-DRAFT EPIC February 23, 2001 The list C and the Huffman tree are used to determine the compressed header format and the M value as follows: 1) Set M = 0 and i = 1 2) Begin at root of tree (single node) and follow the branches up the tree taking the following steps at each node: a) If next branch label is {X} then: i) set M = M * 2^ALIGN ii) set M = M + codeword(i) iii) i=i+1 b.) If next branch label is {D t} then: i) if t = 0 just continue up the tree ii) else set M = M + t and continue up tree c.) If there is > 1 branch, each with a label {t} then: i) set c = codeword(i), i=i+1 ii) find neighboring pair of branches with labels {L} and {U} such that L <= c < U (if c >= number on every branch then take L = number on branch with greatest number) iii) set M = c - L iv) continue up tree on branch labeled {L} 3) End up at a leaf node of the tree corresponding to a particular compressed header format and value M The value M and the header format can then be provided to the decompression front end of Section A.3.2. The value NCODEWORDS does not need to be transmitted explicitly, since the process terminates of its own accord when a leaf node of the tree is reached. A.3.2. Step 2: Decompression front end The decompressor begins with the non-negative integer M which is then translated back into a value hdr(X) for each field X using the format indicator. The following steps are taken: 1 Begin with M 2 From the right, search the format indicator for the next instance of "LSB", "IRREGULAR" or "WRITE" encoding methods. while (A new instance of "LSB", "IRREGULAR" or "WRITE" can be found) { i) Let F denote the corresponding field and let k denote the corresponding parameter (so the format Price et al. [PAGE 30] INTERNET-DRAFT EPIC February 23, 2001 indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k) or "F=WRITE(k,LTABLE,i)"). ii) Let value=M mod 2^k (so value is an integer from 0 to 2^k - 1). For IRREGULAR and WRITE encoding let hdr(F) be the k bits corresponding to value. For LSB encoding let hdr(F) be context(F) with the last k LSBs at offset p replaced by value as described in Section 4.5.1 of [ROHC]. iii) M=(M-value)/(2^k). } 3 From the right, search the format indicator for the next instance of "READ" or "WRITE" encoding methods. while (A new instance of "READ" or "WRITE" can be found) { i) Let F denote the corresponding field and let i denote the corresponding parameter (so the format indicator reads "F=READ(LTABLE,i)" or "F=WRITE(k,LTABLE,i)" for some k,LTABLE). ii) Let index=M mod i (so index is an integer from 0 to i - 1). For READ encoding let hdr(F) be the value stored in LTABLE at the position specified by index. For WRITE encoding store the value hdr(F) in LTABLE at the position specified by index. iii) M=(M-index)/i. } 4 Reading backwards from the right, search the format indicator for the next field encoded as "STATIC" or "VALUE". while (A new instance of "STATIC" or "VALUE" can be found) { i) Let F denote the corresponding field (so the format indicator reads "F=STATIC" or "F=VALUE(v)" for some bit pattern v). ii) For STATIC encoding let hdr(F)=context(F). For VALUE encoding let hdr(F)=v. } The end result is that hdr(F) has been determined for all fields other than UNCOMPRESSED and INFERRED fields. A.3.3. Step 3: Reconstructing the INFERRED fields The final step at the decompressor is to reconstruct the UNCOMPRESSED and INFERRED fields from the transmitted fields and/or the context. In general the reconstruction process is specific to the individual fields (and is implemented using a separate subroutine for each). For the UNCOMPRESSED fields it is necessary to know the lengths of each field, so that the information can be retrieved from the end of the compressed header. For the INFERRED fields the procedure for calculating the actual value of the field must be known. Price et al. [PAGE 31] INTERNET-DRAFT EPIC February 23, 2001 A.4. Hierarchical Huffman tree construction This section describes how EPIC converts an inputset into a new set of compressed header formats. We first introduce basic concepts and notation that are used in the construction of the HH tree. An FNP table is a 3-column table which represents information about the behavior of a particular data item (e.g. a single field, group of fields, or a complete protocol stack). Each row of the table is called a format and contains the following: A format indicator (F): This ties the row to a particular format of the data item; A number of codepoints (N): This is the number of codepoints that is needed to represent the compressed form of the data item when it has this format (N measures the amount of non-redundant information in the data item); A scaled probability (P): This represents the probability of the data item having this format, divided by the N. An FNP table has associated with it a total number of codepoints, CODEPOINTS(table), which is just the sum of the N values for all the rows in the table. The set of data formats given in the table MUST be "exhaustive", which means that the sum of N*P over the table MUST be unity. This ensures that the table contains every possible format for the data item. The order of the rows in the table is significant. There are two types of FNP tables: fundamental tables and constructed tables. Fundamental FNP tables correspond to fields described by a single item in the EPIC toolbox. The FNP tables for all of these types are listed in the following table. Each FNP table has a single row, so N*P=1 directly (and hence the FNP table is exhaustive). +---------------------------+---------+-----------+ | F | N | P | +---------------------------+---------+-----------+ | "Field1=STATIC" | 1 | 1 | +---------------------------+---------+-----------+ | "Field1=IRREGULAR(k)" | 2^k | 1/2^k | +---------------------------+---------+-----------+ | "Field1=VALUE(v)" | 1 | 1 | +---------------------------+---------+-----------+ | "Field1=READ(LTABLE,i)" | i | 1/i | +---------------------------+---------+-----------+ | "Field1=WRITE(k,LTABLE,i)"| i*2^k | 1/(i*2^k) | +---------------------------+---------+-----------+ | "Field1=LSB(k,p)" | 2^k | 1/2^k | +---------------------------+---------+-----------+ | "Field1=UNCOMPRESSED" | 1 | 1 | +---------------------------+---------+-----------+ | "Field1=INFERRED" | 1 | 1 | +---------------------------+---------+-----------+ Price et al. [PAGE 32] INTERNET-DRAFT EPIC February 23, 2001 Note that the name "Field1" should be replaced by the correct name for the field (e.g. "RTP Timestamp", "ESP Sequence Number" etc.). A.4.1. Step 1: Process the EPIC inputset The input to the packet format generator is a list of general parameters and field-specific parameters, referred to as the inputset (and described in Chapter 5). The parameters used in the packet format generator stage are as follows. For the general parameters: ALIGN: Number of bits for alignment (i.e. size of one codeword in the compressed header) NPATTERNS: Number of bit patterns available for use by EPIC in the first codeword of the compressed header MAXFORMATS: Maximum number of compressed header formats (optional) PTHRESH: Minimum probability threshold (optional) In addition, we need to derive the following quantities: AFORMATS: Actual number of header formats used PMISSING: The total probability of all formats discarded during tree FNP-table generation. (PMISSING is not actually used during the tree construction, but can be useful in validation checks during the process.) The field-specific parameters are the encoding methods for the headers which the profile is intended to compress. The information in these encoding methods is used to make successively more complex FNP- tables following either of the SUM or PRODUCT styles described in the next section. The end result is the top-level FNP-table which describes the complete header stack for the profile. This is suitable input for the HH algorithm. A.4.2. Step 2: Resolve complex encoding methods Constructed FNP tables are used to represent more complex data behavior. Complexity is introduced either by allowing more than one behavior for a single item, or by considering the behavior of multiple data items. These two methods correspond to two styles for constructing FNP tables, which we refer to as SUM-style or PRODUCT- style respectively. In each case the construction process implicitly defines the order of the rows in the constructed table, based on the information that describes the construction and the order of the rows in each constituent table. SUM-style construction is typically used to describe a variety of possible behaviors of a single data item (e.g. 80% of the time static, remaining 20% alternating between a set of values). SUM- style construction takes as input a sequence {TABLE(i), P(i)}, where each TABLE is a known FNP table for the data item, and each P(i) is the unscaled probability that the behavior of the data item being described by TABLE(i) occurs (so the sum over the sequence of P(i) Price et al. [PAGE 33] INTERNET-DRAFT EPIC February 23, 2001 is unity). The new table is constructed using the following mechanism: Initialize the constructed table to be empty for each element i of the sequence {TABLE(i), P(i)} { for each row in TABLE(i) { Add a new row to the constructed table with the following entries: F = Format from row of TABLE(i) N = corresponding entry for N from row of TABLE(i) P = P(i)*corresponding entry for P from row of TABLE(i) } } The number of codepoints in the constructed table is the sum of the numbers of codepoints from all the constituent tables, and similarly for the number of rows in the table. The second type of construction is PRODUCT-style. This type of construction is used to describe a data item composed of more than one field. It takes as an input a simple sequence of {TABLE(i)}, where each TABLE(i) is a known FNP table, and there are NT tables in all. The new table is constructed using the following mechanism: Initialize the constructed table to be empty for each row i1 of TABLE(1) { for each row i2 of TABLE(2) { ... for each row iNT of TABLE(NT) { Add a new row to the constructed table with the following entries: F = {Format i1 of TABLE(1), Format i2 of TABLE(2), ... , Format iNT of TABLE(NT)} N = [N(i1) of TABLE(i1)]* ... *[N(iNT) of TABLE(iNT)] P = [P(i1) of TABLE(i1)]* ... *[P(iNT) of TABLE(iNT)] } ... } } The number of codepoints in the constructed table is the product of the numbers of codepoints from all the constituent tables, and similarly for the number of rows in the table. The two types of construction steps can be repeated indefinitely to build up progressively more complex data formats. The basis is always the set of toolbox formats. The end result is known as the top-level FNP table, which describes the whole protocol stack to be compressed. Price et al. [PAGE 34] INTERNET-DRAFT EPIC February 23, 2001 A.4.3. Step 3: Tree pruning Rows are now pruned from the top-level FNP table and alignment is enforced using the following procedure. 1. Sort the rows of the table in order of decreasing N*P. Where there are ties in N*P, preserve the original ordering of the table. 2. If PTHRESH is given, discard all rows from the table for which N*P < PTHRESH, and add the values of N*P to the variable PMISSING. 3. If MAXFORMATS is given, discard all rows from the table other than the first MAXFORMATS-1, and add the values of N*P to PMISSING. Then re-order the complete set of formats using the original ordering of encodings (ignoring probability values). 4. Set AFORMATS = (number of rows remaining in the table) + 1, and NPOINTS = sum of all N values for all rows in the table. 5. Create a new terminal row number AFORMAT, with P = 0 N = (NPATTERNS - NPOINTS) mod (2^ALIGN-1) The format indicator for this row is not used. Note that in practice, it is not necessary to exhaustively generate the completed table before pruning starts (i.e. some short cuts are possible). However, for interoperability, care must be taken to preserve the order that the standard construction method would generate. We complete this stage with a complete set of starting leaf nodes for the tree, with low probability formats discarded. The nodes have assigned N and P values and associated header format indicators. A.4.4. Step 4: Creating the HH Tree Creating a Hierarchical Huffman tree begins with a set of nodes, each one having a number of codepoints (N) and a probability per codepoint (P). During the creation of the tree new nodes may be created and given a number of codepoints and a probability. These will then be incorporated into the tree (i.e. the new nodes are incorporated into the active set out of which the tree is growing, and old nodes which are already part of the tree may be removed from this active set) The tree building finishes when the active set consists of a single node which contains all the codepoints and has probability 1. The format indicator of each leaf node is not used for tree construction (but instead must be stored for use in compression and decompression). We denote a node as NODE(k) for j=1, 2, 3 ... NODE(k) has number of codepoints N(NODE(k))=n[k] NODE(k) has probability per codepoint of P(NODE(k))=p[k] Price et al. [PAGE 35] INTERNET-DRAFT EPIC February 23, 2001 The initial nodes in the active set correspond to the formats from the top-level FNP table: taking on their order, number of codepoints and probabilities. Each codepoint corresponds to one possible header within the given header format. Conventionally we place these leaf nodes at the top of the tree and add new nodes working downwards. Note that the total probability of NODE(k) is n[k]*p[k], and we start with the total probability of all nodes in the active set being 1 - PMISSING. As new nodes are added to the set and old ones removed or replaced, this property is conserved at each stage. As the tree is built up, nodes are joined by branches. There are three possible cases for branches between nodes at different levels: i) A single vertical branch between a pair of nodes. Such a branch is labeled {X}. ii) A pair of branches joining one upper node to two lower ones. These branches are labeled {D t} for some integer t (different for each branch). iii) Several branches each joining one upper node to a common lower node. Each branch is labeled with {t} for some integer t (different for each branch). The tree generation algorithm is as follows: 1. Set NNODES = AFORMAT. 2. Find node j such that n[j] > 0 and p[j] is minimal (if there are equalities then use smallest j). If there is no such j, STOP. 3. if (n[j] is divisible by 2^a) { /* Case A - Figure 4 */ i) replace NODE(j) in the active set with a new node j with new p[j] = 2^ALIGN * p[j] new n[j] = n[j] / 2^ALIGN ii) join the new NODE(j) to original NODE(j) iii)label branch with {X} } else /* n[j] not divisible by 2^a */ { if (n[j] > 2^a) { /* Case B - Figure 5 */ i) create two new nodes, a replacement NODE(j) and NODE(r) with r=NNODES+1, and set: p[r] = p[j] n[r] = n[j] modulo 2^ALIGN new p[j] = p[j] new n[j] = n[j] - n[r] ii) join these nodes to original NODE(j) (left branch going to new NODE(r) and right branch going to NODE(j)) iii)label branch to left with {D 0} and branch to right with {D n[r]} Price et al. [PAGE 36] INTERNET-DRAFT EPIC February 23, 2001 iv) set NNODES=NNODES+1 } else /* so n[j] < 2^a */ { /* Case C - Figure 6 */ i) find nodes k[x], x=1, 2, 3...,b with the following properties: all k[x] different and k[x] != j; p[k[x]] are minimal among the available nodes and increasing with x; if there are ties among p[k[x]], order the k[x] affected by value of k[x]; b is minimal such that n[j]+n[k[1]]+...+n[k[b]]>2^ALIGN. ii) define S[i] = n[j] + sum over (1<=x<=i-1) of n[k[x]] iii) create two new nodes, a replacement NODE(k[b]) and NODE(r) with r=NNODES+1, and set: p[r] = p[k[b]] n[r] = 2^ALIGN - S[b] new p[k[b]] = p[k[b]] new n[k[b]] = n[k[b]] - n[r] iv) join NODE(k[b]) and NODE(r) to original NODE(k[b]) (left branch going to NODE(r) and right branch going to NODE(k[b])) v) label branch to left with {D 0} and branch to right with {D n[r]} vi) create a replacement for NODE(j) and set new n[j] = 1 new p[j] = n[j]*p[j] + n[r]*p[r] + sum(1<= x<= b-1) of n[k[x]]*p[k[x]] vii) join this node to the original NODE(j), to NODE(k[1]) to NODE(k[b - 1]) and NODE(r) viii)label branch to original NODE(j) with {0} ix) for 1<=x<=b-1, label branch to NODE(k[x]) with {S[x]} x) label branch to node r with {S[b]} xi) remove NODE(k[1]) to NODE(k[b-1]) and NODE(r) from the active set xii) Set NNODES=NNODES+1 } } 4. Return to 2 Former NODE(j) {p, n} | | {X} | | New NODE(j) {p[j]=p*2^ALIGN, n[j]=n/2^ALIGN} Figure 4 : Tree Building Case A Price et al. [PAGE 37] INTERNET-DRAFT EPIC February 23, 2001 Former NODE(j) {p, n} /\ / \ / \ / \ / \ {D 0}/ \{D n[r]} / \ / \ / \ NODE(r) New NODE(j) {p[r]=p, n[r]=n mod 2^ALIGN} {n[j]=n-n[r], p[j]=p} Figure 5 : Tree Building Case B Former NODE(k[1])...NODE(k[b-1]) Former NODE(j) | | NODE(K[b]) \ | | /\ \ | | / \ \ | | {D 0} / \ {D n[r]} \ | | / \ \ | | / \ \ {S[1]}| |{S[b-1]} / \ \ | | NODE(r) New \ | | {p, n} NODE(k[b]) \ | | / {0} \ | | / \ | | / \ | | / \ | | / \ | | /{S[b]} \ | | / \ | | / \ | | / \ | | / New NODE(j) Figure 6 : Tree Building Case C (See text for N and P values of created nodes) The HH tree is now ready for use in the compressor and decompressor. A.5. EPIC implementation issues This section considers some issues that arise when implementing EPIC. A.5.1. Arithmetic The implementation of all stages of the EPIC algorithm can be performed using simple arithmetic operations on integer and floating Price et al. [PAGE 38] INTERNET-DRAFT EPIC February 23, 2001 point numbers. However, some aspects of the algorithm require care when being implemented on a real microprocessor. The compression/decompression front end works in terms of M values which are described in the pseudocode as large integers. However, the only operations required are very simple bit pattern manipulations: - left and right shift by ALIGN bits; - bitwise AND with (2^ALIGN-1); - addition to, subtraction from, and comparison with "ordinary" integers. Given that ALIGN will typically be 8, this represents very limited complexity and does not require specialized support in the data path. The calculation of the Huffman tree is more involved, since the numbers of codepoints can be large integers and the probability values are conceptually floating point. The additional operations required are the multiplication and addition of these numbers with themselves and each other, and comparison of probability values. Support for large integer multiplication is not particularly problematic since it only has to be performed at tree generation time (which could even be an off-line activity); however, if rounding error leads to inconsistent floating point orderings then interoperability may be compromised. There are two solutions to this problem. Firstly, for a given (fixed) profile it is possible to construct the tree and also propagate upper bounds on the rounding error for the value at each node. It is then normally possible to verify that rounding cannot have modified the node selections made. This is particularly useful if tree pruning measures are taken to eliminate header formats with very small probabilities. Note that once the tree has been constructed the probability (and indeed codepoint numbers) for the tree nodes are no longer necessary, so this process can be carried out off-line if necessary. The second approach is to note that the only operations involving floating point values are multiplication and addition. Therefore, if the basic input probabilities (the scaled values in the fundamental FNP tables of the toolbox, and P values from profiles) are expressed explicitly as multiples of 1/2^m (for any m), the calculations can be carried out exactly using just the large integer operations which are required for the tree construction anyway. Note that scaled probabilities such as 1/v for arbitrary v cannot be exactly expressed in this way, which means that the summed N*P values for that and derived tables will no longer be precisely unity. However, this does not prevent the construction, only modifying the validation checks that are appropriate at each stage. Noting that this only occurs in the case of lookup-table encoding where the value of v is small, profiles which adopt this approach should use the convention that the probability will be taken as (exactly) n/2^32, where n is the result of dividing 2^32 by v in integer arithmetic. Price et al. [PAGE 39] INTERNET-DRAFT EPIC February 23, 2001 Appendix B. Example Profile for RTP/UDP/IPv4 This appendix describes an example profile for the compression of RTP/UDP/IPv4. Note that the probabilities listed for each encoding method are initial estimates only. These need to be refined with more accurate values from genuine RTP/UDP/IPv4 streams. B.1. General parameters ALIGN : 8 NPATTERNS : 224 MAXHEADERS : 1000 For simplicity, the lookup tables are initialized next to the encoding methods in which they are used. B.2. Field-specific parameters The field-specific parameters are given below for UO-mode with a sequential IP ID. The modifications required to the inputset for a random IP ID are described in Section B.3. Each encoding method is listed in the form of a table as per Section 5.2. RTP/UDP/IPv4-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | CRC | IRREGULAR(3) | 99% | | +----------------------------+-------------+ | | IRREGULAR(7) | 1% | +------------------------+----------------------------+-------------+ | Master Sequence Number | LSB(4,-1) | 100% | +------------------------+----------------------------+-------------+ | RTP Header | RTP-ENCODING | 100% | +------------------------+----------------------------+-------------+ | UDP Header | UDP-ENCODING | 100% | +------------------------+----------------------------+-------------+ | Inner IPv4 Header | IPv4-ENCODING | 100% | +------------------------+----------------------------+-------------+ | Optional Extensions | EXTENSION-ENCODING | 100% | +------------------------+----------------------------+-------------+ Notes: The term "RTP Header" is shorthand for all of the fields in the RTP header. A similar note applies for the "UDP Header" name etc. Price et al. [PAGE 40] INTERNET-DRAFT EPIC February 23, 2001 EXTENSION-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Inner Auth Header | AH-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | Inner ESP Header | ESP-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | Inner GRE Header | GRE-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | Outer IPv4 Header | OUTER-IPv4-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | Outer Auth Header | OUTER-AH-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | Outer ESP Header | OUTER-ESP-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ | Outer GRE Header | OUTER-GRE-ENCODING | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 1% | +------------------------+----------------------------+-------------+ Notes: The optional headers in the protocol stack are treated in exactly the same manner as mandatory headers, except that an additional encoding method is needed to delete the optional header when it is no longer needed. Further information on this topic can be found in Section 6.2.2. The minimal encapsulation header is not considered because it is not covered by [ROHC8]. Price et al. [PAGE 41] INTERNET-DRAFT EPIC February 23, 2001 IPv4-ENCODING: TTL{"",""} +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Version | STATIC | 100% | | Header Length | | | | Reserved Flag | | | | Last Fragment Flag | | | | Fragment Offset | | | | Source Address | | | | Destination Address | | | +------------------------+----------------------------+-------------+ | Packet Length | INFERRED | 100% | | Header Checksum | | | | Protocol | | | +------------------------+----------------------------+-------------+ | May Fragment Flag | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(1) | 0.1% | +------------------------+----------------------------+-------------+ | Type of Service | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(8) | 0.1% | +------------------------+----------------------------+-------------+ | IP-ID | INFERRED | 99% | | +----------------------------+-------------+ | | LSB(5,0) | 0.7% | | +----------------------------+-------------+ | | LSB(8,0) | 0.2% | | +----------------------------+-------------+ | | IRREGULAR(16) | 0.1% | +------------------------+----------------------------+-------------+ | Time to Live | STATIC | 99% | | +----------------------------+-------------+ | | READ(TTL,2) | 0.9% | | +----------------------------+-------------+ | | WRITE(8,TTL,2) | 0.1% | +------------------------+----------------------------+-------------+ | Network Byte Order | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(1) | 0.1% | +------------------------+----------------------------+-------------+ Notes: The Header Length field is not static when IP options are used. The inputset could be extended to handle this possibility if needed. The inputset could also be extended to handle packet fragmentation if required. Price et al. [PAGE 42] INTERNET-DRAFT EPIC February 23, 2001 The Network Byte Order (NBO) field is a Miscellaneous Information field (see Section 7.3). Its use is described in Section 4.5.5 of [ROHC8]. The Packet Length field is inferred from information provided by the link layer. The Header Checksum field is inferred by recalculating the IP checksum at the decompressor. The Protocol field is inferred from the next header in the chain. The IP Identification field is inferred from the Master Sequence Number. UDP-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Source Port | STATIC | 100% | | Destination Port | | | +------------------------+----------------------------+-------------+ | UDP Length | INFERRED | 100% | | UDP Checksum | | | +------------------------+----------------------------+-------------+ | UDP Checksum Value | UNCOMPRESSED | 100% | +------------------------+----------------------------+-------------+ Notes: Even thought the UDP Checksum is always classed as INFERRED, it may still be transmitted in full using the "UDP Checksum Value" field. The UDP Checksum is then copied from this value. The UDP Checksum Value field is a Miscellaneous Information field. If context(UDP checksum) is zero at the compressor then the UDP Checksum Value field is empty, otherwise it carries hdr(UDP Checksum) in full. The length of the UDP Checksum Value field is inferred at the decompressor from context(UDP Checksum). If this is zero then the UDP Checksum Value field is empty, otherwise it is 16 bits long. The UDP Length field is inferred from information provided by the link layer. The UDP Checksum field is inferred from the UDP Checksum Value field. If this is empty then the UDP Checksum is zero, otherwise it is copied in full from the UDP Checksum Value field. Price et al. [PAGE 43] INTERNET-DRAFT EPIC February 23, 2001 RTP-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | Version | STATIC | 100% | | Padding | | | | Extension | | | | SSRC | | | +------------------------+----------------------------+-------------+ | CSRC Counter | INFERRED | 100% | | Sequence Number | | | | RTP Timestamp | | | +------------------------+----------------------------+-------------+ | Payload Type | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(7) | 0.1% | +------------------------+----------------------------+-------------+ | RTP Marker | VALUE("0") | 90% | | TS +----------------------------+-------------+ | | TALKSPURT | 10% | +------------------------+----------------------------+-------------+ | Tsc | VALUE("0") | 99.9% | | +----------------------------+-------------+ | | VALUE("1") | 0.1% | +------------------------+----------------------------+-------------+ | TS Stride | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(8) | 0.04% | | +----------------------------+-------------+ | | IRREGULAR(16) | 0.03% | | +----------------------------+-------------+ | | IRREGULAR(24) | 0.02% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.01% | +------------------------+----------------------------+-------------+ | Time Stride | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(8) | 0.04% | | +----------------------------+-------------+ | | IRREGULAR(16) | 0.03% | | +----------------------------+-------------+ | | IRREGULAR(24) | 0.02% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.01% | +------------------------+----------------------------+-------------+ | CSRC Item 1 | CSRC-ENCODING(1) | 100% | +------------------------+----------------------------+-------------+ | CSRC Item 2 | CSRC-ENCODING(2) | 100% | +------------------------+----------------------------+-------------+ | ... | ... | ... | +------------------------+----------------------------+-------------+ | CSRC Item 16 | CSRC-ENCODING(16) | 100% | +------------------------+----------------------------+-------------+ Price et al. [PAGE 44] INTERNET-DRAFT EPIC February 23, 2001 Notes: Self-describing variable-length values are not needed by EPIC. The TS, Tsc, TS Stride and Time Stride fields are all Miscellaneous Information fields. They are used to reconstruct the RTP Timestamp at the decompressor. Their use is explained in Section 4.5.3 and Section 4.5.4 of [ROHC8]. The CSRC Counter field is inferred by counting the number of non- empty CSRC Item fields. The Sequence Number field is inferred from the Master Sequence Number by linear extrapolation. The RTP Timestamp is inferred from the TS, Tsc, TS Stride and Time Stride fields as in Section 4.5.3 and Section 4.5.4 of [ROHC8]. TALKSPURT: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | RTP Marker | VALUE("0") | 80% | | +----------------------------+-------------+ | | VALUE("1") | 20% | +------------------------+----------------------------+-------------+ | TS | IRREGULAR(5) | 90% | | +----------------------------+-------------+ | | IRREGULAR(8) | 9.9% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.1% | +------------------------+----------------------------+-------------+ Notes: The RTP Marker and TS fields are given their own encoding method because they behave in a dependent manner. A talkspurt requires some LSBs of timestamp to be communicated so that the correct timestamp value can be inferred from the wall clock. The RTP marker is "1" at the beginning of each talkspurt, but is "0" when the timestamp LSBs are resent for robustness. CSRC-ENCODING(i): CSRCVALUEi{""} +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | CSRC Item i | STATIC | 100-a-b-c% | | +----------------------------+-------------+ | | IRREGULAR(0) | a% | | +----------------------------+-------------+ | | READ(CSRCVALUEi,1) | b% | | +----------------------------+-------------+ | | WRITE(32,CSRCVALUEi,1) | c% | +------------------------+----------------------------+-------------+ Price et al. [PAGE 45] INTERNET-DRAFT EPIC February 23, 2001 Note: a = b = c = (17 - i) / 100 AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"} AH-SPI{"","","","",""} +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | AH Next Header | INFERRED | 100% | | AH Reserved | | | +------------------------+----------------------------+-------------+ | AH Length | STATIC | 99.9% | | +----------------------------+-------------+ | | READ(COMMONLENGTHS,5) | 0.09% | | +----------------------------+-------------+ | | WRITE(8,COMMONLENGTHS,5) | 0.01% | +------------------------+----------------------------+-------------+ | AH SPI | STATIC | 99.9% | | +----------------------------+-------------+ | | READ(AH-SPI,5) | 0.099% | | +----------------------------+-------------+ | | WRITE(32,AH-SPI,5) | 0.001% | +------------------------+----------------------------+-------------+ | AH Sequence Number | INFERRED | 99% | | +----------------------------+-------------+ | | LSB(8,-1) | 0.9% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.1% | +------------------------+----------------------------+-------------+ | AH Authentication Data | UNCOMPRESSED | 100% | +------------------------+----------------------------+-------------+ Notes: The initial values for the COMMONLENGHS lookup table are given in hexadecimal. They represent the lengths of the AH Authentication Data field for common hashing functions such as MD5, SHA-1 and so on. If the AH SPI field is null (i.e. 0 bits long) then the AH header is not present and hence all of the UNCOMPRESSED and INFERRED fields are null. Otherwise: The length of the AH Authentication Data field is known from the AH Length field. The AH Next Header field is inferred from the next header in the extension chain. The AH Reserved field is "0000000000000000". The AH Sequence Number field is inferred from the Master Sequence Number by linear extrapolation. Price et al. [PAGE 46] INTERNET-DRAFT EPIC February 23, 2001 ESP-ENCODING: PADDINGLENGTHS{0,0,0,0,0} ESP-SPI{0,0,0,0,0} +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | ESP SPI | STATIC | 99.9% | | +----------------------------+-------------+ | | READ(ESP-SPI,5) | 0.099% | | +----------------------------+-------------+ | | WRITE(32,ESP-SPI,5) | 0.001% | +------------------------+----------------------------+-------------+ | ESP Sequence Number | INFERRED | 99% | | +----------------------------+-------------+ | | LSB(8,-1) | 0.9% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.1% | +------------------------+----------------------------+-------------+ | ESP Padding | INFERRED | 99.99% | | +----------------------------+-------------+ | | UNCOMPRESSED | 0.01% | +------------------------+----------------------------+-------------+ | ESP Padding Length | STATIC | 99.9% | | +----------------------------+-------------+ | | READ(PADDINGLENGTHS,5) | 0.09% | | +----------------------------+-------------+ | | WRITE(PADDINGLENGTHS,5) | 0.01% | +------------------------+----------------------------+-------------+ | ESP Next Header | INFERRED | 100% | +------------------------+----------------------------+-------------+ | ESP Auth Data | IRREGULAR(96) | 100% | +------------------------+----------------------------+-------------+ Notes: If the ESP SPI field is null then the ESP header is not present and hence all of the UNCOMPRESSED and INFERRED fields are null. Otherwise: The length of the ESP Padding field is known from the ESP Padding Length field. The ESP Sequence Number field is inferred from the Master Sequence Number by linear extrapolation. The ESP Padding field has octets that are 1, 2, 3, ..., k where k is the ESP Padding Length. The ESP Next Header field is inferred from the next header in the extension chain. Price et al. [PAGE 47] INTERNET-DRAFT EPIC February 23, 2001 GRE-ENCODING: +------------------------+----------------------------+-------------+ | Field Name(s) | Encoding Method | Probability | +------------------------+----------------------------+-------------+ | GRE Key Present | INFERRED | 100% | | GRE Seq Num Present | | | | GRE Flags | | | | GRE Version | | | | GRE Protocol | | | | GRE Checksum | | | +------------------------+----------------------------+-------------+ | GRE Checksum Present | STATIC | 99.9% | | GRE Routing Present +----------------------------+-------------+ | GRE Strict Source Rte | IRREGULAR(1) | 0.1% | +------------------------+----------------------------+-------------+ | GRE Recursion Control | STATIC | 99.9% | | +----------------------------+-------------+ | | IRREGULAR(3) | 0.1% | +------------------------+----------------------------+-------------+ | GRE Checksum Value | UNCOMPRESSED | 100% | +------------------------+----------------------------+-------------+ | GRE Offset | STATIC | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 0.5% | | +----------------------------+-------------+ | | IRREGULAR(16) | 0.5% | +------------------------+----------------------------+-------------+ | GRE Key | STATIC | 99% | | +----------------------------+-------------+ | | IRREGULAR(0) | 0.5% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.5% | +------------------------+----------------------------+-------------+ | GRE Sequence Number | INFERRED | 99% | | +----------------------------+-------------+ | | LSB(8,-1) | 0.9% | | +----------------------------+-------------+ | | IRREGULAR(32) | 0.1% | +------------------------+----------------------------+-------------+ | GRE Routing | STATIC | 99.9% | | +----------------------------+-------------+ | | UNCOMPRESSED | 0.1% | +------------------------+----------------------------+-------------+ Notes: The GRE Checksum Value field is a Miscellaneous Information field. If context(GRE Checksum) is zero at the compressor then the GRE Checksum Value field is empty, otherwise it carries hdr(GRE Checksum) in full. If the GRE Recursion Control field is null then the GRE header is not present and hence all of the INFERRED fields are null. Otherwise: Price et al. [PAGE 48] INTERNET-DRAFT EPIC February 23, 2001 The length of the GRE Checksum Value field is inferred at the decompressor from context(GRE Checksum). If this is zero then the GRE Checksum Value field is empty, otherwise it is 16 bits long. The GRE Routing field contains its own built-in length indicator. The GRE Flags field is "00000" and the GRE Version field is "000". The GRE Sequence Number field is inferred from the Master Sequence Number by linear extrapolation. The GRE Protocol field is inferred from the next header in the extension chain. The GRE Key Present and GRE Sequence Number present fields are "0" if their respective fields are null, otherwise they are "1". If the Checksum Present and Routing Present fields are both "0" then the GRE Checksum is null, else it is inferred from the GRE Checksum Value field. If this is empty then the GRE Checksum is zero, otherwise it is copied in full from the GRE Checksum value field. OUTER-IPv4-ENCODING: Identical to IPv4-ENCODING but for fields in the outer IP header. Note that the NBO field is replaced by the NBO2 field. OUTER-AH-ENCODING: Identical to AH-ENCODING but for fields in the outer Authentication Header. OUTER-ESP-ENCODING: Identical to ESP-ENCODING but for fields in the outer ESP header. OUTER-GRE-ENCODING: Identical to GRE-ENCODING but for fields in the outer GRE header. B.3. Changes required for a random IP ID If the IP ID changes randomly then a new inputset can be used with following encoding: +------------------------+----------------------------+-------------+ | IP-ID | IRREGULAR(16) | 100% | +------------------------+----------------------------+-------------+ All other general and field-specific parameters are as above. Note that the two inputsets define two different sets of compressed header formats (one for sequential IP IDs, one for random IP IDs). Since the change between a sequential and a random IP ID occurs only rarely, there is no need to reserve space in the compressed headers to indicate when to change between the two sets of compressed header formats. Instead the change should be communicated using a STATIC or DYNAMIC packet. Price et al. [PAGE 49] INTERNET-DRAFT EPIC February 23, 2001 Appendix C. Proof of Optimal Efficiency A very important property of the compressed headers built by EPIC is that they are optimally efficient. This means that the average compressed header size of EPIC is provably the minimum possible for the chosen protocol stack. The precise statement of optimal efficiency is given below: "If every compressed header format specified by EPIC is used with the claimed probability, and if the distribution of headers within each format is flat, then the EPIC compressed headers are optimally efficient for the chosen level of robustness and word alignment". Note that the two caveats in the statement just ensure that EPIC is programmed correctly for the protocol stack it is compressing. If the input to EPIC states that Field 1 will need LSB(4,0) encoding for 20% of the time then optimal efficiency is only achieved if this really is the case. Similarly each of the headers which make use of a compressed header format must turn up equally often, otherwise the extra non-randomness could be used to compress the headers even further. The proof of optimal efficiency is straightforward in the bit- aligned case. Ordinary Huffman encoding works on a stream of characters and provides optimal compression for each character in turn as explained in [HUFF]. Since Hierarchical Huffman gives semantically identical results the proof of optimality applies to it as well. But EPIC treats an entire header as a single character in the Hierarchical Huffman tree, so EPIC attains optimal compression of every header in the chosen protocol stack. In the general word-aligned case the compressed headers produced by EPIC are optimally efficient for the chosen alignment (for example in the octet-aligned case the average compressed header size is minimal given that all headers must be a whole number of octets). The proof can be easily adapted from the bit-aligned case. Price et al. [PAGE 50]