Network Working Group R. Price Internet-Draft A. Surtees Expires: December 23, 2002 M. West Siemens/Roke Manor June 24, 2002 A Formal Notation for Header Compression draft-west-rohc-formal-notation-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 23, 2002. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This draft defines a BNF-based notation for describing the behavior of protocol stacks with respect to compressing the protocol headers. The RObust Header Compression ROHC [1] scheme is designed to compress packet headers over error prone channels. It is built around an extensible core framework that can be tailored to compress new protocol stacks by adding additional ROHC profiles. The aim of the notation is to simplify the creation of new ROHC [1] profiles. Price, et al. Expires December 23, 2002 [Page 1] Internet-Draft A Formal Notation for Header Compression June 2002 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 3. Notation and Encoding Rules . . . . . . . . . . . . . . . 6 3.1 BNF input language for creating new ROHC profiles . . . . 6 3.2 The use of BNF . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Assumptions and Exclusions . . . . . . . . . . . . . . . . 8 4. Overview of the BNF input language . . . . . . . . . . . . 8 4.1 Generated data . . . . . . . . . . . . . . . . . . . . . . 11 5. Library of methods . . . . . . . . . . . . . . . . . . . . 11 5.1 Basic methods . . . . . . . . . . . . . . . . . . . . . . 12 5.1.1 STATIC . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1.2 IRREGULAR . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1.3 IRREGULAR-PADDED . . . . . . . . . . . . . . . . . . . . . 13 5.1.4 VALUE . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1.5 LSB . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2 Handling uncompressed data . . . . . . . . . . . . . . . . 14 5.2.1 UNCOMPRESSED . . . . . . . . . . . . . . . . . . . . . . . 14 5.3 Labelling methods . . . . . . . . . . . . . . . . . . . . 14 5.3.1 LABEL . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.3.2 NEXT_FIELD . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3.3 Use of labelling . . . . . . . . . . . . . . . . . . . . . 16 5.3.3.1 MSN . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3.3.2 Re-use of labels . . . . . . . . . . . . . . . . . . . . . 16 5.3.3.3 Compression of labelled fields . . . . . . . . . . . . . . 17 5.4 INFERRED encoding methods . . . . . . . . . . . . . . . . 17 5.4.1 INFERRED-TRANSLATE . . . . . . . . . . . . . . . . . . . . 17 5.4.2 INFERRED-SIZE . . . . . . . . . . . . . . . . . . . . . . 18 5.4.3 INFERRED-OFFSET . . . . . . . . . . . . . . . . . . . . . 18 5.4.4 INFERRED-OFFSET-VAL . . . . . . . . . . . . . . . . . . . 19 5.4.5 INFERRED-IP-CHECKSUM . . . . . . . . . . . . . . . . . . . 20 5.5 Other mappings . . . . . . . . . . . . . . . . . . . . . . 20 5.5.1 NBO . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.5.2 SCALE . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.6 OPTIONAL . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.7 CONTEXT . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.8 List encoding . . . . . . . . . . . . . . . . . . . . . . 23 5.8.1 LIST . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.8.2 LIST-N . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.8.3 LIST-NEXT . . . . . . . . . . . . . . . . . . . . . . . . 25 5.8.4 LIST-NEXT-N . . . . . . . . . . . . . . . . . . . . . . . 26 5.8.5 LIST-ORDER . . . . . . . . . . . . . . . . . . . . . . . . 27 5.9 Flag methods . . . . . . . . . . . . . . . . . . . . . . . 28 5.9.1 N flag . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.9.2 U flag . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.10 FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.11 CRC . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Price, et al. Expires December 23, 2002 [Page 2] Internet-Draft A Formal Notation for Header Compression June 2002 6. Creating a new ROHC profile . . . . . . . . . . . . . . . 30 6.1 Profile identifier . . . . . . . . . . . . . . . . . . . . 31 6.2 Maximum number of header formats . . . . . . . . . . . . . 31 6.3 Control of header alignment . . . . . . . . . . . . . . . 31 7. Security considerations . . . . . . . . . . . . . . . . . 32 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 32 References . . . . . . . . . . . . . . . . . . . . . . . . 32 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 33 A. The Input Language . . . . . . . . . . . . . . . . . . . . 34 B. Definition of Encoding Methods . . . . . . . . . . . . . . 36 B.1 Library of methods . . . . . . . . . . . . . . . . . . . . 40 B.1.1 STATIC . . . . . . . . . . . . . . . . . . . . . . . . . . 40 B.1.2 IRREGULAR . . . . . . . . . . . . . . . . . . . . . . . . 42 B.1.3 IRREGULAR-PADDED . . . . . . . . . . . . . . . . . . . . . 43 B.1.4 VALUE . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.1.5 LSB . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.1.6 UNCOMPRESSED . . . . . . . . . . . . . . . . . . . . . . . 46 B.1.7 INFERRED-TRANSLATE . . . . . . . . . . . . . . . . . . . . 46 B.1.8 INFERRED-SIZE . . . . . . . . . . . . . . . . . . . . . . 48 B.1.9 INFERRED-OFFSET . . . . . . . . . . . . . . . . . . . . . 49 B.1.10 INFERRED-OFFSET-VAL . . . . . . . . . . . . . . . . . . . 50 B.1.11 INFERRED-IP-CHECKSUM . . . . . . . . . . . . . . . . . . . 50 B.1.12 NBO . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 B.1.13 SCALE . . . . . . . . . . . . . . . . . . . . . . . . . . 52 B.1.14 CONTEXT . . . . . . . . . . . . . . . . . . . . . . . . . 54 Full Copyright Statement . . . . . . . . . . . . . . . . . 55 Price, et al. Expires December 23, 2002 [Page 3] Internet-Draft A Formal Notation for Header Compression June 2002 1. Introduction This document outlines a BNF-based language which can be used to describe the compression of protocol headers. It is intended to be used to simplify the creation of new compression profiles to fit within the ROHC [1] framework . The notation in itself is not sufficient to define the 'bits on the wire' of the compressed headers. To achieve this a set of 'encoding rules' need to be specified. The 'encoding rules' map the abstract description of the protocol encoding into a set of packet formats. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [4]. Profile A ROHC [1] profile is a description of how to compress a certain protocol stack over a certain type of link. Each profile includes one or more sets of compressed header formats and a state machine to control the compressor and the decompressor. Method A method is a procedure for compressing fields. Examples include STATIC encoding (field is the same as the context), and IRREGULAR encoding (field is sent in full). Set of compressed header formats A complete set of compressed header formats uses up all of the indicator bit patterns available at the start of the compressed header. A profile may have several sets of compressed header formats available, but only one set can be in use at a given time. Library of methods The library of methods contains a number of commonly used procedures that can be called to compress fields in the chosen protocol stack. More methods can be added to the library if they are needed. BNF (Backus Naur Form) BNF is a "metasyntax" commonly used to describe the syntax of Price, et al. Expires December 23, 2002 [Page 4] Internet-Draft A Formal Notation for Header Compression June 2002 protocols and languages. BNF input language The descriptive BNF-based input language defined in this document. The BNF description of the ROHC profile assigns one or more methods to each field in the chosen stack. Control data The term 'control data' refers to any data passed between the compressor and decompressor that is not part of the uncompressed header. An example of control data is a header checksum over the uncompressed header to ensure robustness against bit errors and dropped packets. Encoding rules Encoding rules describes a set of mapping rules that need to be defined in order to make use of the notation described in this document. The encoding rules specify how to generate the compressed header formats that match the BNF description. 3. Notation and Encoding Rules This chapter outlines the relationship between the notation and the encoding rules. 3.1 BNF input language for creating new ROHC profiles We specify a simple BNF-based input language for the fast creation of new compression schemes. The language is designed to be easy to use without requiring any knowledge of the encoding rules. The information required to create a new ROHC [1] profile is a description of how the chosen protocol stack behaves and a set of encoding rules. The encoding rules convert the input into one or more sets of compressed header formats that can be used by a ROHC [1] compressor and decompressor. As with all versions of BNF the input language has a rule-based structure, which makes it easy to build new profiles out of existing ones (e.g. when adding new layers to a protocol stack). Note that since the compressed header formats can be generated offline, the fact that profiles are specified using an input language does not affect the processing requirements of compression and decompression. Price, et al. Expires December 23, 2002 [Page 5] Internet-Draft A Formal Notation for Header Compression June 2002 3.2 The use of BNF The use of BNF to describe the syntax of information is well understood. For example, we can describe a 'language' in which well- formed sentences are lists of integers: = *() = [ "+" | "-" ] *() = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" In this example, the terminal symbols are used to match symbols in the input. By replacing these symbols with the names on the left- hand side of the rules, a well-formed input is reduced to the primary rule "". For the purposes of compressing and decompressing header streams, a similar notation is applied. (The input is the bit-stream representing the header to be compressed). In this case, however, the terminal symbols are not directly matched against the input. The terminals are a set of defined 'methods' that can be applied to a header field. A method 'matches' if it can be applied to a set of bits in the input. The methods that are applied can also have 'side effects', for example generating new bits that are added to the bit-stream, which need themselves to be processed. Since compression is applied to a stream of headers, the applicability of a method to the bits in the input stream does not simply consider the current set of bits to be processed, but also historical information about the field. When the BNF description of a header is applied to an input bit- stream, there are two possible outcomes. 1. the bit-stream reduces to the main rule, indicating successful compression. As a side-effect of this, the methods that have been applied will have produced compressed representations of the fields. 2. the bit-stream fails to reduce to the main rule, indicating that compression has failed. Where compression has been successful, a set of encoding rules needs Price, et al. Expires December 23, 2002 [Page 6] Internet-Draft A Formal Notation for Header Compression June 2002 to be applied. These map which of all possible sentences in the BNF was chosen and the compressed representations into a compressed packet. The following sections define the form of the BNF that is used to annotate the compression of a header stream and the base methods that are defined. 3.3 Assumptions and Exclusions The notation described in this document tries to make as few assumptions as possible about the encoding rules that might be used to map the protocol description into packet formats. However, it is inevitable that some assumptions are made. This section tries to capture these. o Some of the methods refer to 'compressed' and 'uncompressed' data. The structure of the compressed representation is expected to be: +-------------------+-------------------+--------------- | Compressed Header | Uncompressed Data | Payload... +-------------------+-------------------+--------------- where the 'uncompressed data' is a variable length part following the main compressed header. However, a set of encoding rules are perfectly entitled to ignore this distinction. In the case where there are restrictions on the compressed data part, for example that it be of known length, this distinction is useful. o Many methods take a 'probability' parameter that indicates the relative likelihood of the method being applied. This is based on the assumption that such information will be useful to the encoding rules when building the packet formats. More generally, we note that there is an implicit assumption that the notation should contain sufficient information to support a wide variety of encoding rules for turning the notation into packet formats. However, encoding rules are at liberty to ignore this information, if they wish. Where it is not clear whether or not a piece of information is useful, it is generally considered that it should be included, since it is easier to ignore unnecessary information than infer missing information. 4. Overview of the BNF input language The language is designed to be flexible but at the same time easy to use. The only requirement for efficient compression is an accurate description of how the relevant protocol stack behaves. Price, et al. Expires December 23, 2002 [Page 7] Internet-Draft A Formal Notation for Header Compression June 2002 As with all versions of BNF, the description of the protocol is built up using the following two constructs: New BNF rule A new method is created from existing ones by writing a new BNF rule Set of choices One or more BNF rules can be assigned to a given field by using the choice ("|") operator This overview of the notation also describes a base library of fundamental methods (STATIC compression, LSB compression etc.). The BNF description of how to compress a new protocol stack always resolves into a selection of these fundamental methods. The exact syntax of the BNF-based input language can itself be described using an existing version of BNF. A suitable variant is "Augmented BNF" (ABNF) as described in RFC-2234 [5]. For example, in ABNF the syntax for defining a new rules in terms of existing ones is as follows: = "=" 1*( ) = *( "|" ) Each instance of calls a BNF rule (or method) that converts a certain number of uncompressed bits into a certain number of compressed bits. Note also that is white space, used to delimit the rules. A complete description of the BNF-based input language can be found in Appendix A. An example of how to create a new BNF rule is given below. A BNF rule known as IPv6_Header is constructed from the basic methods available in the library. This new BNF rule is designed to compress an entire IPv6 header. Price, et al. Expires December 23, 2002 [Page 8] Internet-Draft A Formal Notation for Header Compression June 2002 IPv6_Header_co = Version Traffic_Class_co ECT_Flag_co CE_Flag Flow_Label_co Payload_Length Next_Header Hop_Limit_co Source_Address_co Destination_Address_co Version = VALUE(4,6,100%) Traffic_Class = STATIC(100%) ECT_Flag = STATIC(100%) CE_Flag = VALUE(1,0,99%) | VALUE(1,1,1%) Flow_Label = STATIC(100%) Payload_Length = INFERRED-SIZE(16,288) Next_Header = LABEL(8,next_header) Hop_Limit = STATIC(100%) Source_Address = STATIC(100%) Destination_Address = STATIC(100%) Each field in the IPv6 header is given a choice of possible methods. If a BNF rules or method is not implicitly used 100% of the time for that field (e.g. INFERRED-SIZE) then one of the parameters is the probability that it will be used to encode the field in question. This is important since it is assumed that the encoding rules will make use of these probabilities to ensure that more common methods are encoded more efficiently than rarely used encoding methods. Ideally the probability should equal the percentage of time for which the method is selected to compress the field. However, there is no requirement for probabilities to add up to exactly 100%. Note also that the BNF input language is designed to be both human- readable and machine-readable. If only one protocol stack needs to be compressed, the input language can simply be converted by hand directly to an implementation. However, since the input language provides a complete description of the protocol stack to be Price, et al. Expires December 23, 2002 [Page 9] Internet-Draft A Formal Notation for Header Compression June 2002 compressed, it is possible to compress headers using only the information contained in the BNF description and without any additional knowledge of the protocol stack. This means that it is possible to implement a protocol-independent compressor that can download a new ROHC [1] profile described in the BNF input language and, given a suitable set of encoding rules, immediately use it to compress headers. 4.1 Generated data There may be some data that is passed from the compressor to the decompressor but which is not present in the uncompressed header. This data communicates additional information that might be useful to the decompressor: for example a checksum over the uncompressed header to ensure correct decompression has occurred. This data is generated by certain methods. Further methods need to be applied to compress this generated data. 5. Library of methods The ROHC [1] standard contains a number of different methods (LSB encoding, scaled timestamp encoding, list-based compression etc.) for compressing header fields. We treat these methods as library functions to be called by the BNF input language when they are needed. The following library contains a wide range of basic methods. Moreover, new encoding methods can be added to the library as and when they are needed. Note that this chapter contains an informative description only. The normative description of some methods is given in the appendix. Note, however, that this does not describe how the output of each method is actually composed into a compressed header or transmitted. This is defined by whichever set of encoding rules are applied. The syntax of each method is given using ABNF as defined in RFC-2234 [5]. Note that each of the methods may have one or more parameters of the following type: Price, et al. Expires December 23, 2002 [Page 10] Internet-Draft A Formal Notation for Header Compression June 2002 A non-negative integer (specified as decimal, binary or hex). Binary values are prefixed by 0b and hex values are preceded by 0x. The label-name of a field (see LABEL method) An integer (positive or negative) A non-negative integer used to indicate the length of the field being compressed A probability expressed as a percentage with at most 2 decimal places The name of another BNF rule or method including all parameters The ABNF description of these parameters is found in Appendix A. 5.1 Basic methods This section describes the simplest set of methods that are typically used to handle numeric fields. 5.1.1 STATIC ABNF notation: "STATIC(" ")" The STATIC method can be used when the header field does not change relative to the context. If a field is STATIC then no information concerning the field need be transmitted in the compressed header. The only parameter for the STATIC method is a probability that indicates how often the method will be used. As with any other encoding, the probability should reflect as accurately as possible the chance that the field will be encoded as STATIC. 5.1.2 IRREGULAR ABNF notation: "IRREGULAR(" "," ")" The IRREGULAR method is used when the field cannot be compressed relative to the context, and hence must be transmitted in full in the compressed header. The IRREGULAR method has a length parameter to indicate the length of Price, et al. Expires December 23, 2002 [Page 11] Internet-Draft A Formal Notation for Header Compression June 2002 the field in bits and a probability that indicates how often it will be used. A modified version of IRREGULAR encoding is given below: 5.1.3 IRREGULAR-PADDED ABNF notation: "IRREGULAR-PADDED(" "," "," ")" The IRREGULAR-PADDED method compresses any field that is large in terms of number of bits but has a small absolute value (and hence the most significant bits are zero). The method extracts a certain number of LSBs (Least Significant Bits) of the field. The first parameter gives the overall length of the field, whilst the next parameter specifies the number of LSBs to be encoded in the compressed header. The remaining bits are all taken to be 0 by the decompressor. The probability gives an indication of how often IRREGULAR-PADDED will be used. The IRREGULAR-PADDED method is useful for compressing fields that take small integer values with a high probability. 5.1.4 VALUE ABNF notation: "VALUE(" "," "," ")" The VALUE method can be used to encode one particular value for a field. It is followed by parameters to indicate the length and integer value of the field as well as the probability of the field taking this value. 5.1.5 LSB ABNF notation: "LSB(" "," "," ")" The LSB method compresses the field by encoding only its LSBs (Least Significant Bits). The first parameter indicates the number of LSBs to encode in the compressed header. The second parameter is the offset of the LSBs: it describes whether the decompressor should interpret the LSBs as increasing or decreasing the field value contained in its context. Again the probability indicates how often LSB encoding will be used. Price, et al. Expires December 23, 2002 [Page 12] Internet-Draft A Formal Notation for Header Compression June 2002 To illustrate how the second parameter works, suppose that k LSBs are transmitted with offset p. The decompressor uses these LSBs to replace the k LSBs of the value of this field stored in the context (val), and then adds or subtracts multiples of 2^k so that the new field value lies between (val - p) and (val - p + 2^k - 1). In particular, if p = 0 then the field value can only stay the same or increase. If p = -1 then it can only increase, whereas if p = 2^k then it can only decrease. Recall that for robustness the compressor can store r values for each field in its context. If this is the case then enough LSBs are transmitted so that the decompressor can reconstruct the correct field value, no matter which of the r values it has stored in its context. This is equivalent to Window-based LSB encoding as described in ROHC [1]. 5.2 Handling uncompressed data This section defines methods that are used to handle data that is considered 'uncompressed'. See Section 3.3 for a discussion of this. 5.2.1 UNCOMPRESSED ABNF notation: "UNCOMPRESSED(" "," "," "," ")" The UNCOMPRESSED method is used when treating a field as uncompressed without alteration. All uncompressed fields are encoded 'as-is'. The UNCOMPRESSED method differs from the IRREGULAR method in that the size of the field is not fixed, but instead is specified by a control field. The first parameter identifies the name of the control field. The next three parameters specify a divisor, multiplier and offset for the control field. These parameters scale the value of the control field so that it specifies the exact size of the UNCOMPRESSED field in bits. If the parameters are d, m and p respectively then: size of UNCOMPRESSED field = floor(control field value / d) * m + p UNCOMPRESSED encoding is therefore used with the 'labelling' methods that are used to identify control fields, as explained below. 5.3 Labelling methods Price, et al. Expires December 23, 2002 [Page 13] Internet-Draft A Formal Notation for Header Compression June 2002 5.3.1 LABEL ABNF notation: "LABEL(" "," ")" The LABEL method allows a name to be associated with a number of bits in the uncompressed header that can then be used later. The two obvious uses of this method are: 1. to associate a name with a field so that it can be used to control another method (e.g. INFERRED-OFFSET). 2. to allow fields to be re-ordered in order to capture dependencies between fields. We note that the second case can be captured without the use of this construction, however labelling may be considered simpler to use than the alternative method of restructuring the BNF. For example, consider a case where there is a field whose behavior is dependent upon a flag earlier in the header: In the first example the standard BNF structure is used: eg_1 = case_1 | case_2 case_1 = VALUE(1,0,90%) fields_in_between dependent_field_flag_is_0 case_2 = VALUE(1,1,10%) fields_in_between dependent_field_flag_is_1 In the second example, field labelling is used: eg_2 = LABEL(1,flag) fields_in_between NEXT_FIELD(flag) case_1 | case_2 case_1 = VALUE(1,0,90%) dependent_field_flag_is_0 case_2 = VALUE(1,1,10%) dependent_field_flag_is_1 Price, et al. Expires December 23, 2002 [Page 14] Internet-Draft A Formal Notation for Header Compression June 2002 5.3.2 NEXT_FIELD ABNF notation: "NEXT_FIELD(" ")" The NEXT_FIELD method indicates that the next field to be compressed is not the next field in the input stream, but is a value that was previously identified with a "LABEL" encoding method. 5.3.3 Use of labelling 5.3.3.1 MSN Header compression in ROHC [1] includes the concept of a Master Sequence Number (or MSN). This is used as the primary monotonically increasing value that identifies a header. In this notation, we include two ways of using the MSN: 1. The MSN is constructed by the compressor as a monotonically increasing counter. 2. The MSN is derived directly from a field value in the header stream. Both of these uses are supported by the assumption of a "pre-defined" label, MSN. This value is initialised to the compressor's local MSN count prior to starting compression. The 'LABEL' method can be used to overwrite this with a new value from a header field. 5.3.3.2 Re-use of labels Label names can be re-used, however compression must fail if an attempt is made to re-define the value of a label prior to the current value being compressed. That is, the following is a legal re-use of a label name: eg_1 = LABEL(8,a_label) some_other_fields NEXT_FIELD(a_label) ; compress the labelled value IRREGULAR(8,100%) LABEL(16,a_label) ; re-use the label However, the following usage would be erroneous: eg_2 = LABEL(8,a_label) LABEL(16,a_label) ; attempt to re-define a label Price, et al. Expires December 23, 2002 [Page 15] Internet-Draft A Formal Notation for Header Compression June 2002 ; prior to compressing its value 5.3.3.3 Compression of labelled fields Every labelled field must be compressed in order to guarantee transparent decompression. Compression of labelled fields can be implicit or explicit. Some methods may take a labelled field as a parameter and encode the field as part of their processing. Others may leave the field to be encoded later. All methods that use labels as parameters state whether or not they encode the field value. Compression must fail if a labelled field has not been compressed at the end of processing the header. 5.4 INFERRED encoding methods The following versions of INFERRED encoding are available: 5.4.1 INFERRED-TRANSLATE ABNF notation: "INFERRED-TRANSLATE(" "," *("," "," ) ")" The INFERRED-TRANSLATE method translates a field value under a certain mapping. The first pair of parameters specifies the length of the field before and after the translation. This is followed by additional pairs of integers, representing the field value before and after it is translated. Note that the final field value at the compressor (or equivalently, the original field value at the decompressor) appears first in each pair. For example: INFERRED-TRANSLATE(8,16,41,0x86DD,4,0x0800) ; GRE Protocol The GRE Protocol field behaves in the same manner as the Next Header field in other extension headers, except that it indicates that the subsequent header is IPv6 or IPv4 using the values 0x86DD and 0x0800 instead of 41 and 4. The INFERRED-TRANSLATE encoding method can convert the standard values (as provided by LIST compression defined in Section 5.11) into the values required by the GRE Protocol field. Note that the translation happens "in-situ", so after applying this method, the translated field value should be compressed. Price, et al. Expires December 23, 2002 [Page 16] Internet-Draft A Formal Notation for Header Compression June 2002 5.4.2 INFERRED-SIZE ABNF notation: "INFERRED-SIZE(" "," ")" The INFERRED-SIZE method infers the value of a field from the size of the uncompressed packet. The first parameter specifies the uncompressed field length in bits, and the second parameter specifies the offset of the uncompressed packet size (i.e. the amount of packet which has already been compressed). If the INFERRED-SIZE field value is v, the offset is p and the total packet length after (but not including) the INFERRED- SIZE field is L then the following equation applies (assuming 8 bits in a byte): L = 8 * v + p Note that the use of INFERRED-SIZE makes an assumption about (or places a restriction upon) the encoding rules that will be used: It must be possible to calculate the size, as described above, when the INFERRED-SIZE method is encountered. 5.4.3 INFERRED-OFFSET ABNF notation: "INFERRED-OFFSET(" "," ")" The INFERRED-OFFSET method compresses a field that usually has a constant offset relative to a certain base field. The first parameter describes the length of the field to be compressed. The base field is identified by the name given in the second parameter. The value of this will have been set with the LABEL encoding method. The encoding subtracts the base field from the field to be compressed and replaces the field value by these "offset" bits in the uncompressed header. The offset bits are then compressed by the next encoding method in the input code. Note that where the width of the base field does not match the width of the field to be encoded, the base field is either extended by adding '0' MSBs, or truncated to the appropriate number of LSBs. The offset bits are always the same width as the original field. For example, a typical sequence number can be compressed as follows: Price, et al. Expires December 23, 2002 [Page 17] Internet-Draft A Formal Notation for Header Compression June 2002 INFERRED-OFFSET(32,MSN) ; Sequence Number STATIC(99%) | ; Sequence Number offset LSB(8,-1,0.7%) | LSB(16,-1,0.2%) | IRREGULAR(32,0.1%) In this case the offset field is expected to change rarely and only by small amounts, and hence it is compressed using mainly STATIC and LSB encodings. 5.4.4 INFERRED-OFFSET-VAL ABNF notation: "INFERRED-OFFSET-VAL(" "," ")" The INFERRED-OFFSET-VAL method compresses a field that usually has a constant offset relative to a base field, in the same way as INFERRED-OFFSET. However, it propagates the compressed value, rather than the base value. This is useful for compressing lists of incrementing values, for example. The first parameter describes the length of the field to be compressed. The base field is identified by the name given in the second parameter. The value of this will have been set with the LABEL encoding method. The encoding subtracts the base field from the field to be compressed and replaces the field value by these "offset" bits in the uncompressed header. The value of the label for "base" is updated to have the original value of the field to be compressed. The offset bits are then compressed by the next encoding method in the input code. Note that where the width of the base field does not match the width of the field to be encoded, the base field is extended by adding '0' MSBs, or truncated to the appropriate number of LSBs. The offset bits are always the same width as the original field. For example, a sequence of values such as 10, 20, 30 could be efficiently compressed as follows: Price, et al. Expires December 23, 2002 [Page 18] Internet-Draft A Formal Notation for Header Compression June 2002 LABEL(16,x) INFERRED-OFFSET-VAL(16,x) STATIC(90%) | IRREGULAR(16,10%) ; compress (2nd - 1st) entry INFERRED-OFFSET-VAL(16,x) STATIC(90%) | IRREGULAR(16,10%) ; compress (3nd - 2st) entry NEXT-FIELD(x) STATIC(90%) | IRREGULAR(16,100%) ; Compress 3rd entry 5.4.5 INFERRED-IP-CHECKSUM ABNF notation: "INFERRED-IP-CHECKSUM(" ")" This method is used for calculating the IP checksum. At the compressor it replaces the bits representing the IP checksum with "0" bits (these are a known distance from the beginning of this method). It then continues to compress using encoding_name. At the decompressor it decompresses encoding_name as usual. A 16-bit one's complement sum is then computed over the length of data which has been decompressed in this method and the result copied into the appropriate bits (at a fixed offset) in the header. 5.5 Other mappings This section introduces other methods that are used to map field values in the header. 5.5.1 NBO ABNF notation: "NBO(" ")" The NBO method may be used if there is a possibility that the following field will be in reverse byte order from the rest of the header, as is sometimes seen with the IP ID, for example. The method looks at the specified number of bits (typically 16 or 32) and, using a history, decides whether or not they are in network byte order. If they are it replaces the original bits with the value of the field (unchanged) and a 1-bit flag with the value "1". If the value is not in NBO it replaces the original bits with the byte-swapped value of the field and a 1-bit flag with the value "0". Price, et al. Expires December 23, 2002 [Page 19] Internet-Draft A Formal Notation for Header Compression June 2002 For example: NBO (16) ; Check for network byte order INFERRED-OFFSET(16,MSN) ; IP ID STATIC(80%) | ; IP ID offset LSB(5,-1,15%) | IRREGULAR(16,5%) STATIC(99%) | ; NBO flag IRREGULAR(1,1%) 5.5.2 SCALE ABNF notation: "SCALE(" ")" The SCALE method is used for fields which change by a regular (usually large) amount every packet. The number of bits specified are removed and a suitable scale factor chosen. The original value is replaced with three values, each of the same length. If the original field has value v and the chosen scale is s then these values are: s = the scale factor v mod s = the remainder when v is divided by s v / s = the scaled value of v when v is divided by s using integer arithmetic These 3 fields should be compressed in the order shown (i.e. scale factor; remainder; and scaled value). 5.6 OPTIONAL ABNF notation: "OPTIONAL(" ")" The OPTIONAL method is used to compress fields that are optionally present in the uncompressed header. OPTIONAL encoding requires a 1 bit indicator flag to specify whether or not the optional field is present in the uncompressed header. This flag is extracted from the well-known label 'optional'. The value of the flag is set explicitly by LABEL or implicitly by another Price, et al. Expires December 23, 2002 [Page 20] Internet-Draft A Formal Notation for Header Compression June 2002 method (such as LIST). For example: LABEL(1,optional) ; GRE Key flag OPTIONAL(KEY-ENCODING) ; GRE Key In this case the BNF rule KEY-ENCODING is called to compress the GRE Key field, but only if the Key Flag is set to 1. 5.7 CONTEXT ABNF notation: "CONTEXT(" "," ")" The CONTEXT method is used to store multiple copies of the same field in the context. This encoding method is useful when compressing fields that take a small number of values with high probability, but when these values are not known a-priori. CONTEXT encoding can also be applied to larger fields: even an entire TCP header. This can be very useful when multiple TCP flows are sent to the same IP address, as a single ROHC [1] context can be used to compress the packets in all of the TCP flows. The first parameter specifies the BNF rule that should be used to compress the field. The second parameter specifies how many copies of the field should be stored in the context. CONTEXT encoding applies the specified BNF rule to the uncompressed header, compressing relative to any of the copies of the field stored in its context. It then appends an "index" value to the uncompressed header to indicate to the decompressor which context value should be used for decompression. Consider the following example using the TCP Window field: CONTEXT(TCP-WINDOW,4) ; Window VALUE(2,0,89%) | ; Window context index VALUE(2,1,10%) | VALUE(2,2,0.5%) | VALUE(2,3,0.5%) At most 4 copies of the Window field can be stored in the context. The Window field can be compressed relative to any of these values: the value chosen by the compressor is transmitted to the decompressor using the "index". Price, et al. Expires December 23, 2002 [Page 21] Internet-Draft A Formal Notation for Header Compression June 2002 5.8 List encoding Various sets of fields (or sub-headers) take the form of lists. This section defines a set of methods that are designed to describe the compression of these lists. 5.8.1 LIST ABNF notation: "LIST(" "," "," "," "," "," *("," ) *("," ) ")" The LIST method compresses a list of items that do not necessarily occur in the same order for every uncompressed header. Example applications for the LIST method include TCP options and TCP SACK blocks. The size of the list is determined by a control field in exactly the same manner as for UNCOMPRESSED encoding. The first four integer parameters are defined as in UNCOMPRESSED. The fifth parameter gives the number of bits which should be read from the uncompressed_data to decide which of the BNF rules in the list to use to compress the next list item. These parameters are followed by a set of BNF rules that can be used to compress individual items in the list and a set of integers to identify which method to use. If the integer obtained using the fifth parameter matches the nth integer then use the nth BNF rule. This continues until data amounting to the size of the list has been compressed. Once the list size is reached, LIST encoding appends the order in which the encoding methods were applied to the uncompressed data and the presence of data for the methods. The order consists of a string of n * k-bit entries (where k is the least number of bits required to index each of the n entries). This is a list of the indices in the order in which they occurred. The presence data is a list a 1-bit flags, one per entry in the list which are set to "1" to indicate that the entry was used or "0" if not. These flags occur in the order in which the entries appear in the LIST encoding. It is expected that the OPTIONAL method (see Section 5.6) might be Price, et al. Expires December 23, 2002 [Page 22] Internet-Draft A Formal Notation for Header Compression June 2002 used within LIST. The well-defined label 'optional' is thus set implicitly within LIST encoding. However, the original value of the label is restored when LIST processing terminates. LIST(tcp_data_offset,1,32,0,8, OPTIONAL(TCP-SACK), OPTIONAL(TCP-TIMESTAMP), OPTIONAL(TCP-END), OPTIONAL(TCP-GENERIC), 5,8,0) ; TCP Options STATIC(99.9%) | ; TCP Options order IRREGULAR(8, 0.1%) STATIC(75%) | ; TCP Options presence IRREGULAR(4,25%) It is assumed that the BNF rules are compressed as if they had been applied in the order in which they appear in the LIST method. The order and presence control fields are used to encode this information. The LIST method implicitly encodes the value of the control field that is specified in the first parameter, so there is no need to explicitly compress this field. 5.8.2 LIST-N ABNF notation: "LIST(" "," "," "," "," "," "," *("," "," ) *("," ) ")" The LIST-N method is very similar to the LIST method, except that each BNF rule that appears within the parameter list is followed by a value. This value indicates the maximum number of times that the preceding BNF rule can be applied. This is a useful short-hand in the case where an element may occur a large number of times. (For processing purposes, a count of n is equivalent to having written out the previous encoding method n times). Although it may seem that the original LIST method is redundant, it Price, et al. Expires December 23, 2002 [Page 23] Internet-Draft A Formal Notation for Header Compression June 2002 is currently retained as it may be that these two methods may be treated differently by a set of encoding rules. This form of specification is chosen over a truly open list (i.e. one allowing an unbounded number of repeats) since it is clear that it should be possible to set an upper bound and that this information may be useful to the encoding rules when generating the compressed packet format. All other processing is identical with the LIST method. An example: LIST(4,1,32,0,8, OPTIONAL(TCP-SACK), 1, OPTIONAL(TCP-TIMESTAMP), 1, OPTIONAL(TCP-END), 1, OPTIONAL(TCP-PAD), 32, OPTIONAL(TCP-GENERIC), 3, 5,8,0,1) ; TCP Options 5.8.3 LIST-NEXT ABNF notation: "LIST-NEXT(" "," *("," ) *("," ) ")" LIST-NEXT encoding is similar to basic LIST encoding, except that the next list item to compress is known a-priori from a control field. IP extension headers can be compressed using LIST-NEXT. The first parameter specifies the name of the control field which is examined before each list item is compressed. This is followed by the set of BNF rules available to LIST-NEXT and a set of 0 or more integers. The nth BNF rule can only be called when the nth integer value is obtained from the control field. For example: Price, et al. Expires December 23, 2002 [Page 24] Internet-Draft A Formal Notation for Header Compression June 2002 LIST-NEXT(next_header, OPTIONAL(AH-ENCODING), OPTIONAL(ESP-ENCODING), OPTIONAL(GRE-ENCODING), OPTIONAL(GENERIC-ENCODING), 51,50,47) ; Header Chain STATIC(99.9%) | ; Header Chain order IRREGULAR(8,0.01%) STATIC(75%) | ; Header Chain presence IRREGULAR(4,25%) The IP extension header chain can have a number of specific BNF rules designed for one type of extension header (AH, ESP or GRE) as well as a "generic" BNF rule that can cope with arbitrary extension headers but at reduced compression efficiency. Just as with basic LIST encoding, LIST-NEXT also adds the order in which the BNF rules are applied and their presence or absence to the uncompressed header, so that the decompressor can reconstruct the list in the correct order. Prior to invoking the selected list entry, LIST-NEXT appends the value of the control field to the uncompressed data, so that this will be the first field to be processed. That is, it implicitly performs a 'NEXT_FIELD' operation. If the invoked BNF rule does not assign a new value to the label, then this signals the end of the list to LIST-NEXT. 5.8.4 LIST-NEXT-N ABNF notation: "LIST-NEXT-N(" "," "," *("," "," ) *("," ) ")" The LIST-NEXT-N method is similar to LIST-NEXT, except that each BNF rule is followed by a count of the number times that it can be invoked. LIST-NEXT-N has the same relationship to LIST-NEXT as LIST-N has to LIST. For example: Price, et al. Expires December 23, 2002 [Page 25] Internet-Draft A Formal Notation for Header Compression June 2002 LIST-NEXT(next_header, OPTIONAL(AH-ENCODING),1, OPTIONAL(ESP-ENCODING),1, OPTIONAL(GRE-ENCODING),1, OPTIONAL(GENERIC-ENCODING),3, 51,50,47) ; Header Chain 5.8.5 LIST-ORDER ABNF notation: "LIST-ORDER(" ")" The order information produced by the list methods is not packed in any way. This means that changes to the order will frequently require a significant amount of data to be sent. In many cases, a large number of the list items will be present and the order will not change often. Simply sending the LIST order data when it changes will work in such cases. However, when a list is sparse, and relatively few of the list entries are used, encoding the order information for the entire list is wasteful. This method indicates how more efficient encodings for the LIST order data can be constructed. It takes as a parameter the number of entries in the list and, hence, the number of items that will be present in the order data. It transforms this into a variable-length value which is treated as 'uncompressed' data, and replaces the order data with a single-bit flag that has the value "0" if no order data needed to be sent (i.e. it is unchanged from the data at the decompressor) and "1" if order data needed to be encoded. For example: Price, et al. Expires December 23, 2002 [Page 26] Internet-Draft A Formal Notation for Header Compression June 2002 LIST(tcp_data_offset,1,32,0,8, OPTIONAL(TCP-SACK), OPTIONAL(TCP-TIMESTAMP), OPTIONAL(TCP-END), OPTIONAL(TCP-GENERIC), 5,8,0) ; TCP Options LIST-ORDER(4) ; TCP Options order STATIC(75%) | IRREGULAR(1,25%) ; order present flag STATIC(75%) | ; TCP Options presence IRREGULAR(4,25%) The nature of the encoding of the order data is, of course, to be specified in the encoding rules. 5.9 Flag methods The flag methods are used to modify the behavior of another method or BNF rule. Each flag method has a single parameter, which is the name of a method or BNF rule. The flag method calls this method, but additionally modifies the input or output in some manner. Note that flag methods do not require the original method to be rewritten (as they only modify its input or output). 5.9.1 N flag ABNF notation: "N(" ")" The N flag runs the method specified by its parameter, with the exception that it does not update the context. This is useful when a field takes an unexpected value for one header and then reverts back to its original behavior in subsequent headers. An example of the N flag in use is given below: STATIC(99%) | ; Window N(LSB(11,2048,0.9%)) | IRREGULAR(16,0.1%) In the above example the N flag is applied to the TCP Window field. The field is compressed by transmitting only the last few LSBs, which are always interpreted at the decompressor as a decrease in the field Price, et al. Expires December 23, 2002 [Page 27] Internet-Draft A Formal Notation for Header Compression June 2002 value. However, because the context is not updated the field reverts back to its original value following the decrease. 5.9.2 U flag ABNF notation: "U(" ")" The U flag indicates the the bits generated by the method or BNF rule specified as its parameter should be treated as 'uncompressed' data rather than 'compressed' data. This is useful in cases where the methods under this encoding rule are OPTIONAL. Since we assume that the encoding is based on the selection of a well-formed sentence in the BNF, this may imply that optional encodings contribute data, even if the data that they are designed to compress is not present. Depending upon the restrictions of the encoding rules, this allows the data to be treated as uncompressed, which adds flexibility (for example, it may avoid the need for padding). However, exactly what this flag means (if anything) is a function of the encoding rules. 5.10 FORMAT ABNF notation: "FORMAT(" "," *() ")" The FORMAT method is used to create more than one set of compressed header formats. Each set of compressed header formats has its own set of bit patterns at the start of the compressed header. Thus a profile can have several sets of compressed header formats, but only one set can be in use at a given time. The FORMAT method is followed by a list of k BNF rules. Each BNF rule is given its own set of compressed header formats in the CO packets. Note however that all BNF rules are present in the IR(-DYN) packets, so an IR(-DYN) packet may be sent to change to a new set of compressed header formats. For example: FORMAT(SEQUENTIAL-IP-ID,RANDOM-IP-ID) ; IP ID Price, et al. Expires December 23, 2002 [Page 28] Internet-Draft A Formal Notation for Header Compression June 2002 Two sets of compressed header formats are provided: one for an IP ID that increases sequentially, and one for a randomly behaving IP ID. 5.11 CRC ABNF notation: "CRC(" "," ")" The CRC method generates a CRC checksum calculated across the entire uncompressed header. At the decompressor this CRC is used to validate that correct decompression has occurred. Note that it is possible for different header formats to have different amounts of CRC protection, so extra CRC bits can be allocated to protect important context-updating information. This is illustrated in the example below: CRC(3,99%) | ; Checksum Coverage CRC(7,1%) In general a profile can use any CRC length for which a CRC polynomial has been explicitly defined. The following CRC lengths are currently supported: 3-bit: C(x) = 1 + x + x^3 6-bit: C(x) = 1 + x + x^3 + x^4 + x^6 7-bit: C(x) = 1 + x + x^2 + x^3 + x^6 + x^7 8-bit: C(x) = 1 + x + x^2 + x^8 10-bit: C(x) = 1 + x + x^4 + x^5 + x^9 + x^10 12-bit: C(x) = 1 + x + x^2 + x^3 + x^11 + x^12 16-bit: C(x) = 1 + x^2 + x^15 + x^16 6. Creating a new ROHC profile This chapter describes how to generate new ROHC [1] profiles with this notation. It is important that the profiles are specified in an unambiguous manner so that any compressor and decompressor using the profiles will be able to interoperate. There may be a number of other parameters that are required to parameterise the generation of packet formats. The form and value of the parameters that are used must be specified as part of the encoding rules. The selection and correct parameterisation of the encoding rules must Price, et al. Expires December 23, 2002 [Page 29] Internet-Draft A Formal Notation for Header Compression June 2002 be performed in order to turn the notated description into a useful form. This specifies the 'bits on the wire' such that arbitrary compressors and decompressors can interoperate. Some parameters that are expected to be needed in order to fully specify a profile are briefly described below. 6.1 Profile identifier The profile_identifier is a 16-bit integer that is used when negotiating a common set of profiles between the compressor and decompressor. Official profile identifiers are assigned by IANA to ensure that two distinct profiles do not receive the same profile identifier. Note that the 8 MSBs of the profile identifier are used to specify the version of the profile (so that old profiles can be obsoleted by new profiles). 6.2 Maximum number of header formats If one considers the number of different well-formed 'sentences' that can be generated from a typical BNF description, it can be seen that this value will quickly become unmanageably large. The max_formats parameter controls the number of compressed header formats to be stored at the compressor and decompressor. If more compressed header formats are generated than can be stored then all but max_formats formats are discarded. How the limitation is applied is dependent upon the encoding rules. However, note that if more than max_formats header formats are generated for the IR packet then this means that there are headers which it may be impossible to compress using this profile (resulting in a switch to a different profile). In a similar manner the max_sets parameter controls the total number of sets of compressed header formats to be stored. Recall that a profile can have several sets of compressed header formats, but only one set may be in use at a given time. It is important to note that the maximum size specified by max_formats applies to each individual set of header formats, so the total overall number of formats that need to be stored is equal to max_formats * (max_sets + 2) including the 2 sets of formats for the IR and IR-DYN packets. 6.3 Control of header alignment The alignment of the compressed headers is controlled using the bit_alignment parameter. The packet formats generated by the encoding rules MUST be guaranteed to be an integer multiple of Price, et al. Expires December 23, 2002 [Page 30] Internet-Draft A Formal Notation for Header Compression June 2002 bit_alignment bits long. Additionally, the parameter npatterns can be used to reserve bit patterns in the compressed header. The parameter specifies the number of bit patterns in the first word (i.e. the first bit_alignment bits) of the compressed header that are available for use by the compressor. Consequently npatterns takes a value between 1 and 2^bit_alignment. For compatibility with ROHC [1], it is important for the compressor not to use the bit patterns 111XXXXX in the first octet of each compressed header because they are reserved by the ROHC framework. So to produce a set of header formats compatible with ROHC [1] the bit_alignment parameter MUST be set to 8 and npatterns MUST be set to 224. 7. Security considerations The notation describes compressed header formats for direct use in ROHC profiles. Insofar as an abstract notation has security considerations, this can be assumed to inherit those of ROHC [1]. The notation can also describe how to compress and decompress headers. As such they are interpreted or compiled by the compressor and decompressor. An error in the description may cause undefined behaviour. In a situation where these descriptions could be dynamically updated consideration MUST be given to authenticating and verifying the integrity of the profile. 8. Acknowledgements A number of important concepts and ideas have been borrowed from ROHC [1]. Updates to the LIST functions owe much to discussions with Qian and Hongbin. Thanks to Paul Ollis for field labelling; and to Rob Hancock and Stephen McCann for putting up with the author's arguments and making helpful suggestions, frequently against the tide! The authors would also like to thank Carsten Bormann, Christian Schmidt, Max Riegel, Lars-Erik Jonsson, Qian Zhang and Hongbin Lao for their comments and encouragement. We haven't always agreed, but the arguments have been fun! References [1] Bormann, C., Burmeister, C., Degermark, M., Fukushima, H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le, K., Liu, Price, et al. Expires December 23, 2002 [Page 31] Internet-Draft A Formal Notation for Header Compression June 2002 Z., Martensson, A., Miyazaki, A., Svanbro, K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust Header Compression (ROHC): Framework and four profiles: RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001. [2] Jacobson, V., "Compressing TCP/IP headers for low-speed serial links", RFC 1144, February 1990. [3] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [5] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. Authors' Addresses Richard Price Siemens/Roke Manor Roke Manor Research Ltd. Romsey, Hants SO51 0ZN UK Phone: +44 (0)1794 833681 EMail: richard.price@roke.co.uk URI: http://www.roke.co.uk Abigail Surtees Siemens/Roke Manor Roke Manor Research Ltd. Romsey, Hants SO51 0ZN UK Phone: +44 (0)1794 833131 EMail: abigail.surtees@roke.co.uk URI: http://www.roke.co.uk Price, et al. Expires December 23, 2002 [Page 32] Internet-Draft A Formal Notation for Header Compression June 2002 Mark A. West Siemens/Roke Manor Roke Manor Research Ltd. Romsey, Hants SO51 0ZN UK Phone: +44 (0)1794 833311 EMail: mark.a.west@roke.co.uk URI: http://www.roke.co.uk Appendix A. The Input Language The following is an ABNF description of a ROHC [1] profile: = [ ] [ ] = 1*(%x09 | %x0A | %x0D | %x20) ; white space used as ; delimiters = "profile_identifier" = "max_formats" = "max_sets" = "bit_alignment" = "npatterns" = "CO packet" Price, et al. Expires December 23, 2002 [Page 33] Internet-Draft A Formal Notation for Header Compression June 2002 = "IR-DYN packet" = "IR packet" = 1*() = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" = "0x" *() = | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F" The following is an ABNF description of a new encoding method written using the input language (note that the previous ABNF rules still apply). Comments are contained between a ";" symbol and the end of the line, and are ignored in the input language. = "=" 1*( ) = *( "|" ) = ["(" *("," ) ")"] = *( | | "_" | "-" | "/" | ".") = "A" | "B" | "C" | ... | "X" | "Y" | "Z" | "a" | ... | "z" Price, et al. Expires December 23, 2002 [Page 34] Internet-Draft A Formal Notation for Header Compression June 2002 = | | | | = [] [] ["." []] "%" = | | = "0b" *() = "0" | "1" = = ["-"] Appendix B. Definition of Encoding Methods This appendix gives a pseudocode description of selected encoding methods. Additionally, the following general functions are used in the pseudocode description: Data handling: Value based functions: Price, et al. Expires December 23, 2002 [Page 35] Internet-Draft A Formal Notation for Header Compression June 2002 length (var) Returns the length in bits of var value (var) Returns the value of var str (len, val) Returns a variable with length len and containing the value val lsb (var, k) Returns the least significant k bits of a (width defined or not) value (var) in a width defined value msb (var, k) Returns the most significant k bits of a width defined value (var) in a width defined value concat (var_1, var_2) Returns a width defined value with most significant bits being var_1 and least being var_2 - ie concatenates the two strings Addition, subtraction and multiplication can be done on width defined values as long as the two values have the same length (i.e. this translates into modular arithmetic) but must be done carefully. Stack functions: For the purposes of describing how the encoding methods work, the header to be compressed is treated as a 'bit stack'. In the functions below an item has a length and value associated with it which can be found using the functions above. Price, et al. Expires December 23, 2002 [Page 36] Internet-Draft A Formal Notation for Header Compression June 2002 push (stack_name, n, var) This function adds n bits with value var to the top of the stack (stack_name). push (stack_name, item) This function adds n bits with value var to the top of the stack (stack_name) where n is the length of item and it has value var. pop (stack_name, n, item) Pop n bits of data off a stack (stack_name) and put into item top (stack_name, n) Returns the item made up of the top n bits on a stack (stack_name) stack-size (stack) Returns the size in bits of a stack Context based functions: NB -- leaving a value in the context empty is NOT the same as storing something which zero-bits wide. context (i, var) Put the value stored at the ith position in the context into var. If there is no value in the ith position then return empty Miscellaneous: Price, et al. Expires December 23, 2002 [Page 37] Internet-Draft A Formal Notation for Header Compression June 2002 lookup-crc-function (n) Finds the function for computing an n-bit crc over a specified amount of data and putting the value into a given width defined variable byte-swap (item) Returns another item the same as item but stored in the opposite byte ordering compute-16-checksum (data, length) Compute a 16-bit one's complement sum over data of length encode (item) The result of applying an encoding method is passed to the encode function restore (item) The result of decompressing a field is returned to build up the uncompressed header label_value (n) Returns the value of the label 'n' Other constructs: foreach item in [reverse] list : end loop provides for iteration over all the elements of a list [in reverse order] call function (...) indicates a reference to a function is being invoked choose (...) the choice of whatever is specified is implementation specific - it may affect efficiency but whatever choice is made will not affect interoperability, for example, choose (j < k) - pick an arbitrary value j which is less than k. Some global variables: Price, et al. Expires December 23, 2002 [Page 38] Internet-Draft A Formal Notation for Header Compression June 2002 uncompressed_data Bit based stack Compression - initially header and payload, finally empty Decompression - initially empty, finally original header and payload compressor_state The current state at the compressor (can be set to "IR", "IR-DYN" or "CO") current_set The set of compressed header formats currently in use crc The decompressed crc for checking against one calculated over full header (a width defined value) MSN The Master Sequence Number - a width defined value (its value increases by one for each packet received at the compressor or it is set to be a specific value using the SET-MSN method if the header contains a suitable field) B.1 Library of methods This section gives pseudocode for each of the methods in the library. Note that for each method two pieces of pseudocode are given: one describes the functionality for compressing with the encoding method and the other the functionality for decompressing. Note that all of the global variables required for these procedures are defined at the beginning of Appendix A. It is assumed that as soon as the 'return' command is encountered, the procedure stops. For COMPRESS functions which check through the r values stored in the context, the value of r is implementation-specific. Note that if any of the r context values is empty, any method attempting to compress relative to the context will automatically fail. B.1.1 STATIC Price, et al. Expires December 23, 2002 [Page 39] Internet-Draft A Formal Notation for Header Compression June 2002 STATIC (P%) COMPRESS (STATIC) var context_val, item var n, i context (1, context_val) n = length (context_val) # check that the value to be compressed matches each of the r # values stored in context for this encoding - if not then # STATIC can't be used to compress this encoding for i = 1 to r context (i, context_val) if (context_val <> top (uncompressed_data, n)) then return false endif end loop pop (uncompressed_data, n, item) return true end COMPRESS DECOMPRESS (STATIC) var context_val context (1, context_val) restore (uncompressed_data, context_val) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 40] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.2 IRREGULAR IRREGULAR(n,P%) COMPRESS (IRREGULAR) var item pop (uncompressed_data, n, item) encode (item) return true end COMPRESS DECOMPRESS (IRREGULAR) var item pop (compression_stack, n, item) restore (uncompressed_data, item) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 41] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.3 IRREGULAR-PADDED IRREGULAR-PADDED(n,k,P%) COMPRESS (IRREGULAR-PADDED) var item, temp if (value (top (uncompressed_data, n - k)) <> 0) then return false else pop (uncompressed_data, n, item) temp = lsb (item, k) encode (temp) return true endif end COMPRESS DECOMPRESS (IRREGULAR-PADDED) var item, temp var full full = 0 pop (compression_stack, k, temp) full = full + value (temp) item = str (n, full) restore (item) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 42] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.4 VALUE VALUE(n,v,P%) COMPRESS (VALUE) var value if (top (uncompressed_data, n) <> v) then return false else pop (uncompressed_data, n, value) return true endif end COMPRESS DECOMPRESS (VALUE) var item item = str (n, v) restore (item) end DECOMPRESS B.1.5 LSB LSB(k,p,P%) COMPRESS (LSB) var context_val, item, temp, new_item, lsb_val, p_item var n, i context (1, context_val) n = length (context_val) p_item = str (n, p) new_item = top (uncompressed_data, n) for i = 1 to r context (i, context_val) # check new_item in interval # [value (context_val)-p, value (context_val)-p+2^k] temp = (new_item - context_val + p_item) if (value (temp) >= 2^k) then return false Price, et al. Expires December 23, 2002 [Page 43] Internet-Draft A Formal Notation for Header Compression June 2002 endif end loop pop (uncompressed_data, n, item) lsb_val = lsb (item, k) encode (lsb_val) return true end COMPRESS DECOMPRESS (LSB) var context_val, interval_start, interval_end, recd, p_item var new_item, twok_item var n, new, start, end context (1, context_val) n = length (context_val) p_item = str (n, p) twok_item = str (n, 2^k) pop (compression_stack, k, recd) interval_start = context_val - p interval_end = interval_start + twok_item new_item = concat (msb (interval_start, (n-k)), recd) # check whether value (new_item) is in interval # [value (interval_start), value (interval_end)] # allowing for the interval to wrap over zero. If not then # recalculate new_item start = value (interval_start) end = value (interval_end) new = value (new_item) if (((start < end) and ((new < start) or (new > end))) or ((start > end) and ((new < start) or (new > end)))) then new_item = concat (msb (interval_end, (n-k)), recd) endif restore (new_item) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 44] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.6 UNCOMPRESSED UNCOMPRESSED(n,d,m,p) COMPRESS (UNCOMPRESSED) var scale_len var unc, len_item scale_len = floor((value(label_value (n)) / d) * m + p) if (stack-size (uncompressed_data) < scale_len) then return false else pop (uncompressed_data, scale_len, unc) encode (unc) return true endif end COMPRESS DECOMPRESS (UNCOMPRESSED) var scale_len var unc, len_item len_item = label_value (n) scale_len = floor( (value (len_item) / d) * m + p) restore (unc) # scale_len bits retrieved by encoding rule end DECOMPRESS B.1.7 INFERRED-TRANSLATE INFERRED-TRANSLATE(n,m,a(0),b(0),a(1),b(1)...,a(k-1),b(k-1)) COMPRESS (INFERRED-TRANSLATE) var item, trans_item var found, i found = 0 i = 0 while ((i < k) and (found = 0)) if (value (top (uncompressed_data, m)) = b(i)) pop (uncompressed_data, m, item) trans_item = str (n, a(i)) push (uncompressed_data, trans_item) Price, et al. Expires December 23, 2002 [Page 45] Internet-Draft A Formal Notation for Header Compression June 2002 found = 1 else i = i + 1 endif end while return found end COMPRESS DECOMPRESS (INFERRED-TRANSLATE) var trans_item var i, found found = 0 i = 0 pop (uncompressed_data, trans_item) while ((i < k) and (found = 0)) if (value (trans_item) = a(i)) push (uncompressed_data, m, b(i)) found = 1 else i = i + 1 endif end while end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 46] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.8 INFERRED-SIZE INFERRED-SIZE(n,p) COMPRESS (INFERRED-SIZE) var item var bits_in_byte # usually 8! # Check that the decompressor will be able to reconstruct this # value at this point in decompression if ((bits_in_byte * value (top (uncompressed_data, n))) + p) <> stack-size (uncompressed_data) then return false else pop (uncompressed_data, n, item) return true endif end COMPRESS DECOMPRESS (INFERRED-SIZE) var size var bits_in_byte # usually 8! # Reconstruct n-bit size restore (size) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 47] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.9 INFERRED-OFFSET INFERRED-OFFSET(n, b) COMPRESS (INFERRED-OFFSET) var item, base, offset pop (uncompressed_data, n, item) base = label_value (b) extend (base, n) offset = item - base push (uncompressed_data, offset) return true end COMPRESS DECOMPRESS (INFERRED-OFFSET) var item, base, offset base = label_value (b) pop (uncompressed_data, n, offset) item = offset + base push (uncompressed_data, item) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 48] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.10 INFERRED-OFFSET-VAL INFERRED-OFFSET-VAL(n, b) COMPRESS (INFERRED-OFFSET-VAL) var item, base, offset pop (uncompressed_data, n, item) base = label_value (b) extend (base, n) offset = item - base push (uncompressed_data, offset) label_value (b) = item return true end COMPRESS DECOMPRESS (INFERRED-OFFSET-VAL) var item, base, offset item = label_value (b) pop (uncompressed_data, n, offset) base = item - offset push (uncompressed_data, item) label_value (b) = base end DECOMPRESS B.1.11 INFERRED-IP-CHECKSUM INFERRED-IP-CHECKSUM(new_method) COMPRESS (INFERRED-IP-CHECKSUM) var compress_function var can_compress var item, ip_sum # zero the ip checksum bits in the header pop (uncompressed_data, 80, item) pop (uncompressed_data, 16, ip_sum) push (uncompressed_data, 16, 0) push (uncompressed_data, item) return call new_method Price, et al. Expires December 23, 2002 [Page 49] Internet-Draft A Formal Notation for Header Compression June 2002 end COMPRESS DECOMPRESS (INFERRED-IP-CHECKSUM) var decompress_function var item, temp var init_len, final_len, decomp_len, ip_sum init_len = stack-size (uncompressed_data) call new_method final_len = stack-size (uncompressed_data) decomp_len = final_len - init_len ip_sum = compute-16-checksum (uncompressed_data, decomp_len) pop (uncompressed_data, 80, item) pop (uncompressed_data, 16, temp) push (uncompressed_data, 16, ip_sum) push (uncompressed_data, item) end DECOMPRESS B.1.12 NBO NBO(n) COMPRESS (NBO) var original, swapped var nbo_flag pop (uncompressed_data, n, original) choose (nbo_flag of zero or one) # # according to whether original is in network byte order or not # (1 implies it is, 0 implies it isn't) # this decision can be made based on the context history # push (uncompressed_data, 1, nbo_flag) if (nbo_flag = 0) then swapped = byte-swap (original) push (uncompressed_data, swapped) else push (uncompressed_data, original) Price, et al. Expires December 23, 2002 [Page 50] Internet-Draft A Formal Notation for Header Compression June 2002 endif return true end COMPRESS DECOMPRESS (NBO) var original, decomped, nbo_flag pop (uncompressed_data, n, decomped) pop (uncompressed_data, 1, nbo_flag) if (value (nbo_flag) = 1 then original = decomped else original = byte-swap (decomped) endif push (uncompressed_data, original) end DECOMPRESS B.1.13 SCALE Price, et al. Expires December 23, 2002 [Page 51] Internet-Draft A Formal Notation for Header Compression June 2002 SCALE(n) COMPRESS (SCALE) var original var scale_f, scaled_val, remainder pop (uncompressed_data, n, original) choose (scale_f) # such that original is changing by a multiple # of scale_f each header. This decision can # be based on context history scaled_val = value (original) / scale_f remainder = value (original) mod scale_f push (uncompressed_data, n, scaled_val) push (uncompressed_data, n, remainder) push (uncompressed_data, n, scale_f) return true end COMPRESS DECOMPRESS (SCALE) var scaled_val, scale_f, remainder var original pop (uncompressed_data, n, scale_f) pop (uncompressed_data, n, remainder) pop (uncompressed_data, n, scaled_val) original = ((scaled_val * scale_f) + remainder) mod 2^n push (uncompressed_data, n, original) end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 52] Internet-Draft A Formal Notation for Header Compression June 2002 B.1.14 CONTEXT CONTEXT(new_method,k) COMPRESS (CONTEXT) var n, j, m var compress_function var can_compress n = ceiling (log2(k-1)) choose (j < k) # compressor choice can_compress = call new_method push (uncompressed_data, n, j) return can_compress end COMPRESS DECOMPRESS (CONTEXT) var n, j, m var decompress_function var index n = ceiling (log2(k-1)) pop (uncompressed_data, n, index) j = value (index) call new_method end DECOMPRESS Price, et al. Expires December 23, 2002 [Page 53] Internet-Draft A Formal Notation for Header Compression June 2002 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Price, et al. Expires December 23, 2002 [Page 54]