Robust Header Compression R. Price Internet-Draft R. Finking Expires: January 16, 2005 Siemens/Roke Manor G. Pelletier Ericsson AB July 18, 2004 Formal Notation for Robust Header Compression (ROHC-FN) draft-ietf-rohc-formal-notation-03.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 16, 2005. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document defines ROHC-FN: a formal notation for specifying how to compress and decompress fields from an arbitrary protocol stack. ROHC-FN is intended to simplify the creation of new compression profiles to fit within the ROHC (RFC 3095 [4]) framework. Price, et al. Expires January 16, 2005 [Page 1] Internet-Draft ROHC-FN July 2004 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview of ROHC-FN . . . . . . . . . . . . . . . . . . . . . 5 3.1 Scope of ROHC-FN . . . . . . . . . . . . . . . . . . . . . 5 3.2 Fundamentals of ROHC-FN . . . . . . . . . . . . . . . . . 6 3.3 Example using IPv4 . . . . . . . . . . . . . . . . . . . . 7 4. Normative Definition of ROHC-FN . . . . . . . . . . . . . . . 9 4.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Comments . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2.1 End of line comments . . . . . . . . . . . . . . . . . 10 4.2.2 Block comments . . . . . . . . . . . . . . . . . . . . 11 4.3 Field Attributes . . . . . . . . . . . . . . . . . . . . . 11 4.3.1 uncomp_value . . . . . . . . . . . . . . . . . . . . . 12 4.3.2 comp_value . . . . . . . . . . . . . . . . . . . . . . 12 4.3.3 context_value . . . . . . . . . . . . . . . . . . . . 12 4.3.4 updated_context_value . . . . . . . . . . . . . . . . 13 5. Encoding Methods . . . . . . . . . . . . . . . . . . . . . . . 13 5.1 Basic Encoding Methods . . . . . . . . . . . . . . . . . . 14 5.1.1 value . . . . . . . . . . . . . . . . . . . . . . . . 14 5.1.2 irregular . . . . . . . . . . . . . . . . . . . . . . 14 5.1.3 static . . . . . . . . . . . . . . . . . . . . . . . . 15 5.1.4 lsb . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.1.5 index . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1.6 select . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 Relative Field Encoding Methods . . . . . . . . . . . . . 17 5.2.1 same_as . . . . . . . . . . . . . . . . . . . . . . . 17 5.2.2 group . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2.3 expression . . . . . . . . . . . . . . . . . . . . . . 18 5.2.4 derived_value . . . . . . . . . . . . . . . . . . . . 20 5.2.5 alt . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.6 nbo . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.7 inferred_offset . . . . . . . . . . . . . . . . . . . 22 5.2.8 inferred_ip_v4_header_checksum . . . . . . . . . . . . 23 5.2.9 uncompressible . . . . . . . . . . . . . . . . . . . . 24 5.3 Control Field Encoding Methods . . . . . . . . . . . . . . 24 5.3.1 choice . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3.2 discriminator . . . . . . . . . . . . . . . . . . . . 25 5.3.3 reserved . . . . . . . . . . . . . . . . . . . . . . . 25 5.3.4 control_field . . . . . . . . . . . . . . . . . . . . 25 5.3.5 crc . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Compound Encoding Methods . . . . . . . . . . . . . . . . 27 5.4.1 single_format . . . . . . . . . . . . . . . . . . . . 27 5.4.2 multiple_formats . . . . . . . . . . . . . . . . . . . 28 5.4.3 encode_list . . . . . . . . . . . . . . . . . . . . . 30 5.4.4 list_index . . . . . . . . . . . . . . . . . . . . . . 31 5.4.5 generic_comp_list . . . . . . . . . . . . . . . . . . 31 Price, et al. Expires January 16, 2005 [Page 2] Internet-Draft ROHC-FN July 2004 5.5 User-defined Encoding Methods . . . . . . . . . . . . . . 33 5.5.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . 33 5.5.2 Parameters . . . . . . . . . . . . . . . . . . . . . . 34 5.5.3 Subfields . . . . . . . . . . . . . . . . . . . . . . 34 5.6 Chain Items . . . . . . . . . . . . . . . . . . . . . . . 35 6. Security considerations . . . . . . . . . . . . . . . . . . . 35 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 36 A. Bit-level Worked Example . . . . . . . . . . . . . . . . . . . 37 A.1 Example Packet Format . . . . . . . . . . . . . . . . . . 37 A.2 Initial Encoding . . . . . . . . . . . . . . . . . . . . . 37 A.3 Basic Compression . . . . . . . . . . . . . . . . . . . . 38 A.4 Inter-packet compression . . . . . . . . . . . . . . . . . 40 A.5 Variable Length Discriminators . . . . . . . . . . . . . . 44 A.6 Default encoding . . . . . . . . . . . . . . . . . . . . . 46 Intellectual Property and Copyright Statements . . . . . . . . 49 Price, et al. Expires January 16, 2005 [Page 3] Internet-Draft ROHC-FN July 2004 1. Introduction ROHC-FN is a formal notation designed to help with the definition of ROHC (RFC 3095 [4]) header compression profiles. ROHC-FN offers a library of encoding methods that are often used in ROHC profiles, so new profiles can be specified without the need to redefine this library from scratch. Informally, an encoding method is a function that maps between uncompressed data and compressed data. The simplest encoding methods only have one input and one output: the input is an uncompressed field and the output is the compressed version of the field. More complex encoding methods can compress multiple fields at the same time, e.g. "list" encoding from RFC-3095 [4], which is designed to compress an ordered list of fields. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. o Control field Control fields are transmitted from a ROHC compressor to a ROHC decompressor, but are not part of the uncompressed header itself. o Encoding method Encoding methods are functions that can be applied to compress fields in a protocol header. o Field ROHC-FN divides the protocol header to be compressed into a set of contiguous bit patterns known as fields. o Library of encoding methods The library of encoding methods contains a number of commonly used encoding methods for compressing header fields. o Profile A ROHC (RFC 3095 [4]) profile is a description of how to compress a certain protocol stack over a certain type of link. Each profile includes packet formats to compress the headers and a state machine to control the actions of each endpoint. Price, et al. Expires January 16, 2005 [Page 4] Internet-Draft ROHC-FN July 2004 3. Overview of ROHC-FN This section gives an overview of ROHC-FN and explains how it can be used to specify how to compress header fields as part of a ROHC profile. 3.1 Scope of ROHC-FN This section describes the scope of the ROHC-FN. It explains how the formal notation relates to the ROHC framework and to specific ROHC profiles. The ROHC framework is common to all profiles: it defines the general principles for performing ROHC compression. It defines the concept of a profile, which makes ROHC a general platform for different compression schemes. It sets link layer requirements, and in particular negotiation requirements for all ROHC profiles. It defines a set of common functions such as Context Identifiers (CIDs), padding and segmentation. It also defines common packet formats (IR, IR-DYN, Feedback, Short-CID expander, etc.), and finally it defines a generic, profile independent, handling of feedback. A ROHC profile is a description of how to compress a certain protocol stack over a certain type of link. For example, ROHC profiles are available for RTP/UDP/IP and many other protocol stacks. Each ROHC profile can be further subdivided into the following two components: 1. Packet formats, for compressing and decompressing headers; and 2. State machine, for maintaining synchronization of the context The purpose of the packet formats is to define how to compress and decompress headers. The packet formats must define the compressed version of each uncompressed header (and vice versa). The packet formats will typically compress headers relative to a context of field values from previous headers in a flow. This improves the overall compression ratio, because this takes into account redundancies between successive headers. The purpose of the state machine is to ensure that the profile is robust against bit errors and dropped packets. The state machine manages the context, providing feedback and other mechanisms to ensure that the compressor and decompressor contexts are kept synchronized. The ROHC-FN is designed to help in the specification of the packet Price, et al. Expires January 16, 2005 [Page 5] Internet-Draft ROHC-FN July 2004 formats for use in ROHC profiles. It offers a library of encoding methods for compressing fields, and a mechanism for combining these encoding methods to create packet formats tailored to a specific protocol stack. The state machine for the profiles is beyond the scope of ROHC-FN, and it must be provided separately as part of a complete profile specification. 3.2 Fundamentals of ROHC-FN One important fundamental in ROHC-FN is the creation of bindings between a field and the encoding method. When writing the following statement, header_field ::= encoding_method the symbol "::=" means "is encoded as". It does not represent an assignment operation from the right hand side to the left side. Instead, it is a two-way operation in that it both represents the compression and the decompression operation in a single statement, where variables take on values through the process of two-way matching. Two-way matching is a binary operation that attempts to make the operands the same (similar to the unification process in logic). The operands represent one unspecified data object, and values can be matched from either operand. More specifically, this statement creates a reversible binding between the attributes of a field and the encoding method (including the parameters specified with the method). At the compressor, a packet format can be used if a set of bindings that is successful for all fields can be found. At the decompressor, the operation is reversed using the same bindings and the fields are filled according to the specified bindings. For example, the 'static'' encoding method creates a binding between the attribute corresponding to the uncompressed value of the field and the attribute corresponding to the value of the field in the context. o for the compressor, this binding is successful when both values are the same for a packet format that sends no bits for that field. Otherwise, a packet format using another encoding method that is successful when the parameters are not equal is used (such as a method that would send the field uncompressed). o for the decompressor, the same binding succeeds for a packet type that sends no bits for that field if a valid context entry containing the value of the uncompressed field exist. Otherwise, Price, et al. Expires January 16, 2005 [Page 6] Internet-Draft ROHC-FN July 2004 the binding will fail decompression for that packet type. 3.3 Example using IPv4 Rather than immediately diving in with a formal definition of ROHC- FN, this section gives an overview of how the notation is used by means of an example. The example will develop the formal notation for an encoding method capable of compressing a single, well-known header: the IPv4 header. The first step is to specify the overall encoding method for the IPv4 header. In this case we will use the "single_format" encoding method (defined in Section 5.4.1). This encoding method compresses a header by dividing it into fields, compressing each field in turn, and then creates a single packet containing the compressed version of each field. We define this by writing the following in ROHC-FN: ipv4_header ::= single_format, { The above expression defines that the IPv4 header is encoded by sending a single packet format (containing the compressed version of each field in the IPv4 header). The opening curly brace indicates that subsequent definitions are local to "ipv4_header". This scoping mechanism helps to clarify which fields belong to which headers: it is also useful when compressing complex protocol stacks with several headers and fields, often sharing the same names. The next step is to specify the fields contained in the uncompressed IPv4 header, which is accomplished using ROHC-FN as follows: uncompressed_data ::= version, % 4 bits header_length, % 4 bits tos, % 6 bits ecn, % 2 bits length, % 16 bits id, % 16 bits reserved, % 1 bit dont_frag, % 1 bit more_fragments, % 1 bit offset, % 13 bits ttl, % 8 bits protocol, % 8 bits checksum, % 16 bits Price, et al. Expires January 16, 2005 [Page 7] Internet-Draft ROHC-FN July 2004 src_addr, % 32 bits dest_addr, % 32 bits After this, we specify the fields contained in the compressed header. Exactly what appears in this list of fields depends on the encoding methods used to encode the uncompressed fields - it may be possible to compress certain fields down to 0 bits, in which case they do not need to be sent in the compressed header at all. compressed_data ::= src_addr, % 32 bits dest_addr, % 32 bits length, % 16 bits id, % 16 bits ttl, % 8 bits protocol, % 8 bits tos, % 6 bits ecn, % 2 bits dont_frag, % 1 bit Note that the order of the fields in the compressed header is independent of the order of the fields in the uncompressed header. The next step is to specify the encoding methods for each field in the IPv4 header. These are taken from encoding methods in the ROHC-FN library. Since the intention here is to illustrate the use of the notation, rather than to describe the optimum method of compressing IPv4 headers, this example uses only three encoding methods. The "value" encoding method (defined in Section 5.1.1) can compress any field whose length and value are fixed. No compressed bits need to be sent because the field can be reconstructed using its known size and value. The "value" encoding method is used to compress five fields in the IPv4 header, as described below: version ::= value (4, 4), header_length ::= value (4, 5), reserved ::= value (1, 0), more_fragments ::= value (1, 0), offset ::= value (13, 0), The first parameter indicates the length of the uncompressed field in bits, and the second parameter gives its integer value. The "irregular" encoding method (defined in Section 5.1.2) can be used to encode any field whose length is fixed. It is a general Price, et al. Expires January 16, 2005 [Page 8] Internet-Draft ROHC-FN July 2004 encoding method that can be used for fields to which no other encoding method applies. All of the bits in the uncompressed field need to be sent; hence this encoding does not achieve any compression. tos ::= irregular (6), ecn ::= irregular (2), length ::= irregular (16), id ::= irregular (16), dont_frag ::= irregular (1), ttl ::= irregular (8), protocol ::= irregular (8), src_addr ::= irregular (32), dest_addr ::= irregular (32), Finally, the third encoding method is specific only to IPv4 headers: "inferred_ipv4_header_checksum" (defined in Section 5.2.8) is a specific encoding method for calculating the IP checksum from the rest of the header values. Like the "value" encoding method, no compressed bits need to be sent, since the field value can be reconstructed at the decompressor. checksum ::= inferred_ipv4_header_checksum } At this point, the above example has defined the format of the compressed IPv4 header, and provided enough information to allow an implementation to construct the compressed header from an uncompressed header and vice versa. 4. Normative Definition of ROHC-FN This section gives the normative definition of ROHC-FN, including its syntax and any data structures that it requires. 4.1 Syntax To compress a field or a header using ROHC-FN, the following must be provided: 1. A name for the field or header to be compressed; and 2. An encoding method, together with any parameters and subfields that it needs. For example: Price, et al. Expires January 16, 2005 [Page 9] Internet-Draft ROHC-FN July 2004 field_name ::= encoding_method (param1, param2, ...), { sub_field_1 ::= foo1, sub_field_2 ::= foo2 (foo2_param), etc. } This describes how to map the field "field_name" from an uncompressed value to a compressed value, by encoding it using "encoding_method" with the specified parameters and subfields. The use of the braces "{" and "}" provide a scoping mechanism, in that "sub_field_1" and "sub_field_2" are actually contained within "field_name". It is also legal syntax to qualify the names of the fields explicitly, using a dot. Thus, without using the brace scoping mechanism, the previous example would look like this: field_name ::= encoding_method (param1, param2, ...), field_name.sub_field_1 ::= foo1, field_name.sub_field_2 ::= foo2 (foo2_param), etc. This construct can be nested so that complex relationships can be notated. 4.2 Comments Comments do not affect the formal meaning of what is notated, but can be used to improve readability. Their use is optional. Free English text can be inserted into a profile definition to explain why something has been done a particular way, to clarify the intended meaning of the notation, or to elaborate on some point. To this end, the two commenting styles described in the subsections below can be used. It should be noted that profiles will be read, by many readers, in terms of their intuitive English meaning. Such readers will not necessarily differentiate between the formal and commentary parts of a profile. It is essential therefore that any comments written are correct. Comments inserted should not be considered of lesser importance than the rest of the notation in a profile, and should be strictly consistent with it. 4.2.1 End of line comments The end of line comment style makes use of the "%" comment character. Price, et al. Expires January 16, 2005 [Page 10] Internet-Draft ROHC-FN July 2004 Any text between the "%" character and the end of the line has no formal meaning. For example: %----------------------------------------------------------------- % IR-REPLICATE packet formats %----------------------------------------------------------------- % The following fields are included in all of the IR-REPLICATE % packet formats: % replicate_common ::= discriminator, % 8 bits tcp.seq_number, % 32 bits tcp.flags.ecn, % 2 bits Figure 10 4.2.2 Block comments The block comment style makes use of the "/*" and "*/" delimiters. Any text between the "/*" and the "*/" has no formal meaning. For example: /****************************************************************** * IR-REPLICATE packet formats *****************************************************************/ /* The following fields are included in all of the IR-REPLICATE * packet formats: */ replicate_common ::= discriminator, /* 8 bits */ tcp.seq_number, /* 32 bits */ tcp.flags.ecn, /* 2 bits */ The block comment style allows comments to be nested, unlike C, C++ or Java. 4.3 Field Attributes Within ROHC-FN, the following categories of attribute are available for each field to be compressed: o uncomp - uncompressed attributes of the field o comp - compressed attributes of the field o context - attributes of the field's context o updated_context - attributes of the field's updated context Price, et al. Expires January 16, 2005 [Page 11] Internet-Draft ROHC-FN July 2004 For each category above, the field has an attribute called "value", and a attribute called "length". The length attribute indicates the length in bits of the associated value attribute. The "uncomp" and "comp" categories have an additional attribute, "hdr_start". The attribute "value" cannot be left undefined. If the attribute "value" has no useful value (such as for fields that do not appear in the relevant header), it is set to 'no value' and the corresponding length attribute is then zero. The attribute "hdr_start" contains the position in the relevant header (compressed or uncompressed) that the field starts at, specified in bits. The encoding methods are formally defined using these attributes. This set of attributes entirely characterizes the relationship between the uncompressed and compressed representation of a field. An identifier is used to refer to attributes. The identifier is a catenation of the attribute category (e.g. "comp"), an underscore ("_"), and the attribute name (e.g. "length"). For example, the "comp_length" attribute indicates the length in bits of the compressed value of the field. The formal notation for refering to any of the attributes of a particular field is the attribute's identifier, followed by the field name in parentheses. For example: uncomp_value (tcp_ip.options.list_length) gives the uncompressed value of the field in between parenthesis. Each of the value attributes is explained in more detail below. 4.3.1 uncomp_value The attribute "uncomp_value" contains the uncompressed value of the field. This can either be the value of a field from the uncompressed header, or the uncompressed value of a control field. All fields have an uncomp_value attribute. 4.3.2 comp_value The attribute "comp_value" contains the compressed value of the field, i.e. the value of the field as it appears in the compressed header. This attribute is set to 'no value' for any field that does not appear in the compressed header. 4.3.3 context_value The attribute "context_value" contains information about the previous value of the field. Unless otherwise specified, the value of a Price, et al. Expires January 16, 2005 [Page 12] Internet-Draft ROHC-FN July 2004 field's context_value attribute will be set to 'no value' for the first packet in the stream. The purpose of this attribute is to allow inter-packet compression. The context_value attribute is key to efficient compression, since the behavior of one header is very often related to the behavior of previous headers in a flow. For example, the RTP Sequence Number (SN) field increases by 1 for each consecutive header in an RTP stream. ROHC profiles take into account the dependency between successive headers by storing and referencing a field's context_value attribute. However, whilst it is possible to notate this explicitly, most of the time a field's context_value is referenced implicitly by the encoding methods. In ROHC-FN, an encoding method can read a field's context_value attribute and, on completion, updates the field's context_value attribute with the uncomp_value attribute (or some other value if that is appropriate - see Section 4.3.4). All fields have a context_value attribute. 4.3.4 updated_context_value The attribute "updated_context_value" contains the value that the context attribute will take after the compression of the current header is complete. At the start of compression of the current header, the updated_context_value attribute is set to 'no value'. The state machine for a ROHC profile defines specific points at which the context is updated: at these points the updated_context_value attribute is copied into the context attribute. If a field's encoding method does not assign a defined value to the "updated_context_value" attribute and the attribute is left with no value, the default behaviour of copying the "uncomp_value" attribute is carried out instead. All fields have an updated_context_value attribute. 5. Encoding Methods Encoding methods are the basic building blocks of the formal notation. ROHC (RFC 3095 [4]) contains a number of different techniques for compressing header fields (LSB encoding, value encoding, list-based compression etc.). Most of these techniques are part of the ROHC-FN library so that they can be reused when creating new ROHC profiles. Price, et al. Expires January 16, 2005 [Page 13] Internet-Draft ROHC-FN July 2004 5.1 Basic Encoding Methods This section describes the set of encoding methods that are self contained (in that they do not refer to other fields). 5.1.1 value The "value" encoding method is used to encode header fields that always have a fixed length and value: field ::= value (length_param, value_param) where "length_param" binds with the "uncomp_length" attribute of the field, and where "value_param" binds with the "uncomp_value" attribute of the field. For example, the IPv6 header version number is a four bits field that always has the value 6: version ::= value (4, 6) Since the value is fixed, it is omitted from the compressed header. 5.1.2 irregular The "irregular" encoding method is used to encode a field in the compressed packet with a bit pattern identical to the original field in the uncompressed packet. e.g. field ::= irregular (length_param) where "length_param" binds with the "uncomp_length" attribute of the field. For example, the checksum field of the TCP header is a sixteen bits field that does not follow any pattern: tcp_checksum ::= irregular (16) Note that this encoding method is a special case of "uncompressible" encoding (see Section 5.2.9) where the length of the uncompressible field is fixed. If the field length is not constant, use "uncompressible" encoding instead. Price, et al. Expires January 16, 2005 [Page 14] Internet-Draft ROHC-FN July 2004 5.1.3 static The "static" encoding method compresses a field whose length and value are the same as for the previous header in the flow: field ::= static where the field's "uncomp_value" attribute binds with the field's "context_value" attribute. Since the field value is the same as the previous field value, the entire field can be reconstructed from the context, so it is compressed to zero bits and does not appear in the compressed header. For example, the source port of the TCP header is a field whose value does not change from one packet to the other for a given flow: src_port ::= static 5.1.4 lsb The Least Significant Bit encoding method, "lsb", compresses a field whose value differs by a small amount from the value stored in the context. field ::= lsb (num_lsbs_param, delta_param) where "num_lsbs_param" is the number of least significant bits to use, and "delta_param" is the minimum expected change in the value of the field from one packet to the next (i.e. the interpretation interval offset). The parameter "num_lsbs_param" binds with the "comp_length" attribute, and the "uncomp_value" attribute binds with (context_value - delta_param + 2^uncomp_value - 1). The "lsb" encoding method can compress a field whose value lies between (context_value - delta_param) and (context _value - delta_param + 2^num_lsbs_param - 1) inclusively. In particular, if delta_param = 0 then the field value can only stay the same or increase relative to the previous header in the flow. If delta_param = -1 then it can only increase, whereas if delta_param = 2^num_lsbs_param then it can only decrease. The compressed field takes up the specified number of bits in the compressed header (i.e. num_lsbs_param). Price, et al. Expires January 16, 2005 [Page 15] Internet-Draft ROHC-FN July 2004 For example, a sequence number used as a control field that can only increase: msn ::= lsb (2, 0) See the ROHC specification (RFC 3095 [4]) for additional details on LSB encoding, where the parameter "k" corresponds to the parameter "num_lsbs_param" and where interpretation interval offset "p" corresponds to the parameter "delta_param". 5.1.5 index The "index" encoding method compresses a field whose value is one of a list of possible values. It takes two parameters. The first is the length of the uncompressed field, in bits. The second is the list of possible values that the field can take: field ::= index (length_param, list_value_param) where "length_param" binds with the "uncomp_length" attribute of the field, and where "list_value_param" binds with the "uncomp_value" attribute of the field. The compressed packet contains the index of the value to be compressed. The leftmost item in the list has an index of 0, the next item an index of 1 and so on. For example, a header field containing flags taking one of three possible values: flags ::= index (8, 3, 5, 22) The compressed field has a length of log2 of the number of items in the list, rounded up to the nearest integer. So the above example would have a compressed length of 2 bits. 5.1.6 select The "select" encoding method differs from other methods in that it does not describe the compresed value of the field (i.e. does not bind the "comp_value" attribute with anything), but rather binds the "uncomp_value". This method can be used to assert that the field has a specified value, in order to choose a particular packet format from a list of possible formats: Price, et al. Expires January 16, 2005 [Page 16] Internet-Draft ROHC-FN July 2004 field ::= select(field_value_param) where the "field_value_param" binds with the field's "uncomp_value" attribute. The "select" encoding method does not provide a compressed value of the field; it is therefore necessary to use a second encoding method to specify how the field is encoded in the compressed message. The purpose of the "select" method is to be used in conjuction with other encoding methods that require a choice to be made from a number of alternative encodings (see Section 5.2.5 for example). 5.2 Relative Field Encoding Methods The encoding methods in this section can encode a field whose value can be inferred from the value of one (or many) other field(s). The '.' scoping notation is used to refer to fields outside the scope in which they have been defined. For example, to refer to a field named "field_1" outside the scope where a compressed header format named "test_single_format" would be defined, 'test_single_format.field_1' is used. The same scoping mechanism can be used for subfields within fields. 5.2.1 same_as The "same_as" encoding method is used for fields that are always identical to another field: field ::= same_as (field_ref_param) where "field_ref_param" can be the value of any other field, including all its attributes. This method is particularly useful to encode fields that are needed by encoding techniques that need to refer to other fields. For example: count ::= inferred_offset (4), { base_field ::= same_as (test_offset.id), offset ::= value (4, 3) } Since the "same_as" encoding method gets the entire value of the field from another field, it takes up zero bits in the compressed header. Price, et al. Expires January 16, 2005 [Page 17] Internet-Draft ROHC-FN July 2004 5.2.2 group The "group" encoding method is used to group two or more non-contiguous uncompressed fields together, so that they can be treated as a single field for compression. This encoding method takes a single argument, which is the list of fields to be joined together. This argument is specified as a subfield, "field_list": field ::= group, { field_list ::= field_1.subfield_1, field_2.subfield_1.foo, field_1.subfield_2, ... } The "group" encoding method does not define any bits in the compressed header directly, but, like the "same_as" method (above), it is intended for use in conjunction with the "control_field" encoding method (defined in Section 5.3.4 below). For example: tcp.ecn_and_reserved ::= control_field, { base_field ::= group, { field_list ::= ip.ecn, tcp.flags.ecn, tcp.reserved }, compressed_method ::= value(8, 0) } 5.2.3 expression The "expression" encoding method defines the uncompressed value using a mathematical expression: field ::= expression(uncomp_length_param, ) where the "uncomp_length_param" binds with the field's "uncomp_length" attribute, and where is a mathematical expression. The value of binds with the field's "uncomp_value" attribute. Price, et al. Expires January 16, 2005 [Page 18] Internet-Draft ROHC-FN July 2004 The expression can be made up of any of the following components: Integers Integers can be expressed as decimal values, binary values (prefixed by 0b), or hex values (prefixed by 0x). Negative integers are prefixed by a "-" sign. Operators The operators +, -, *, / and ^ are available, along with ( and ) for grouping. Note that k / v is undefined if k is not an integer multiple of v (i.e. if it does not evaluate to an integer). However, k // v is always defined. The precedence for each of the operators, along with parentheses is given below (higher precedence first): (, ) ^ *, / +, - x ^ y Evaluates to x raised to the power of y. x // y Evaluates to the integer division of x by y. floor (k, v) Returns k / v rounded down to the nearest integer (undefined for v == 0). mod (k, v) Returns k - v * floor(k, v). log2 (w) Returns the smallest integer k where v <= 2^k, i.e. it returns the smallest number of bits in which value v can be stored. Expressions may refer to any of the attributes of each field (as described in Section 4.3 above). If any of the attributes used in the expression is undefined, the value of the expression is undefined. Undefined expressions are illegal. Here is a complete example of expression encoding: data_offset ::= expression(4, (uncomp_value(tcp_ip.options.list_length) Price, et al. Expires January 16, 2005 [Page 19] Internet-Draft ROHC-FN July 2004 + 160) / 32) Since the field value is described entirely in terms of the expression, it does not appear in the compressed header. 5.2.4 derived_value The "derived_value" encoding method is similar to the value encoding method, except that the length and value of the field do not have to be constant. The fields length and value are specified as subfields, rather than as inline parameters: field ::= derived_value, { field_length ::= encoding_method_1, field_value ::= encoding_method_2 } For example: tcp.seq_number ::= derived_value, { field_length ::= value (0, 8), field_value ::= expression (uncomp (tcp.seq_number.residue) + (uncomp (tcp.seq_number.scaled) * uncomp (tcp.payload_size))) } The following statement, which encodes a field using the "value" encoding method: field_1 ::= value (4, 11) has identical meaning to: field_1 ::= derived_value, { field_length ::= value (0, 4), field_value ::= value (0, 11), } The number of bits that the "derived_value" encoding takes up in the compressed header depends on the encoding methods used for the length and the value. The above examples would both take up zero bits in Price, et al. Expires January 16, 2005 [Page 20] Internet-Draft ROHC-FN July 2004 the header since the length parameter for the "value" encoding is '0', and the "expression" encoding also take up zero bits in the compressed header. If both encoding methods used for length and value take up bits in the compressed header, the length MUST be encoded first, followed by the encoding of the field's value. 5.2.5 alt The "alt" encoding method is used to encode a field in one of a number of ways, depending on the value of another field: field ::= alt(reference_field_param), { guard ::= { alt_field ::= }, guard ::= { alt_field ::= }, guard ::= ...... } where each of the "guard" subfields is a copy of the field specified by "reference_field_param". An attempt is made to encode each guard field using the corresponding encoding method. If the encoding succeeds, "alt_field" is binded with "field" and then encoded using the given method. The guard can be encoded using any encoding method. However, the norm is to use the "select" encoding method (see Section 5.1.6), which is designed for the purpose of selecting between several alternative encodings. For example, the following encodes the length field using the "static" encoding if there is no extension header ("extension_header"), or using the "irregular" encoding if there is an extension header. Price, et al. Expires January 16, 2005 [Page 21] Internet-Draft ROHC-FN July 2004 length ::= alt(extension_header), { guard ::= select(0) { alt_field ::= static }, guard ::= select(1) { alt_field ::= irregular(8) } } The "alt" encoding method is typically used to create different sets of packet formats. 5.2.6 nbo The Network Byte Order encoding method, "nbo", takes the value of a field from the uncompressed header and formats it in network byte order: field ::= nbo (uncomp_length_param), { nbo_flag ::= static, nbo_value ::= irregular (uncomp_length_param) } where the "uncomp_length_param" binds with the field's "uncomp_length" attribute. The parameter "uncomp_length_param" specifies the length of the field in the uncompressed header. The two subfields, "nbo_flag" and "nbo_value", specify the value of the NBO flag and the NBO value of the field. If the NBO flag is set to '1' then the field is copied as is to "nbo_value", otherwise "nbo_value" takes the byteswapped field. This method can only be applied on a field for which the size is a multiple of two octets. For example: ip_id ::= nbo (16), { nbo_flag ::= static, nbo_value ::= irregular (16) } 5.2.7 inferred_offset The "inferred_offset" encoding method compresses a field as an offset relative to a certain base value. The method has one parameter and Price, et al. Expires January 16, 2005 [Page 22] Internet-Draft ROHC-FN July 2004 two subfields: field ::= inferred_offset (uncomp_length_param), { base_field ::= same_as (other_field), offset ::= static } The parameter "uncomp_length_param" defines the length of the uncompressed field in bits. The "base_field" subfield specifies the base value, along with how to encode that value in the compressed header. The "offset" subfield specifies the offset from the base value, along with the encoding method for the offset. Subfields are used, rather than parameters, to allow for the specification of encoding methods. The base value is typically specified as the value of another field, although any value can be specified. id ::= inferred_offset (16), { base_field ::= same_as (msn), offset ::= static } The statement above means that the "id" field is 16 bits long in the uncompressed header, and it has a static offset from the value of another field called "msn". The exact number of bits that it takes to encode a field using "inferred_offset" depends on the encoding methods used for the "base_field" and "offset". The above example takes up zero bits in the compressed header, since both "same_as" and "static" occupy zero bits. 5.2.8 inferred_ip_v4_header_checksum The "inferred_ip_v4_header_checksum" encoding method is specifically used to compress the IP checksum field: checksum ::= inferred_ip_v4_header_checksum Since the checksum can be constructed solely from the other fields in the header, no bits are present in the compressed hader for this encoding. Price, et al. Expires January 16, 2005 [Page 23] Internet-Draft ROHC-FN July 2004 5.2.9 uncompressible The "uncompressible" encoding method is a generalisation of "irregular" encoding (see Section 5.1.2), where the length of the field is derived from the value of another field instead of being constant. The encoding method has a single subfield, "length", which defines the length of the uncompressible field: field ::= uncompressible { length ::= encoding method } An example of how this method can be used in practice is as follows: test_uncompressible ::= single_format, { uncompressed_data ::= length, data, compressed_data ::= data, length ::= static, data ::= uncompressible { length ::= expression(uncomp_value(test_uncompressible.length) // 2 * 2 + 4) } } 5.3 Control Field Encoding Methods This section provides encoding methods for handling control fields, i.e. fields that appear in the compressed header to control the compression in some way but do not appear in the uncompressed header. 5.3.1 choice The "choice" encoding method assigns to a field a value chosen by the compressor. The encoding method has one parameter, which indicates the number of choices: field ::= choice(num_choice_param) Price, et al. Expires January 16, 2005 [Page 24] Internet-Draft ROHC-FN July 2004 For example: rpa_flag ::= choice(2) The compressor is allowed to choose any value from zero up to the parameter minus one. The rpa_flag field in the example may therefore be assigned any value from 0 to 1 inclusive. The compressed field has a length of log2 of the number of items in the list of choices, rounded up to the nearest integer. So the above example would have a compressed length of 1 bit. 5.3.2 discriminator The "discriminator" method sets the given field to a literal bit string. It is intended to be used in conjunction with the "multiple_formats" encoding method (see Section 5.4.2), which allows for more than one method of compression for a given header. The "discriminator" method allows a unique bit pattern to be specified, in binary, which identifies the particular compression format that has been used. The discriminator bit-string is specified in between two single quote marks. For example: discriminator ::= '011' 5.3.3 reserved The "reserved" method sets the given field to a literal bit string. It is intended to be used to insert padding or any other fixed value. The "reserved" method allows any bit pattern to be specified, in binary. The reserved bit-string is specified in between two single quote marks. For example: reserved ::= '000' 5.3.4 control_field The "control_field" encoding method is used for fields that are sent in the compressed header, but that do not appear in the uncompressed header at all. It has two subfields: "base_field" and "compressed_method". The "base_field" is the field on which the control field is based on. The "compressed_method" specifies the method to use to encode the given field. For example: Price, et al. Expires January 16, 2005 [Page 25] Internet-Draft ROHC-FN July 2004 order_data ::= control_field, { base_field ::= same_as (test_list.list_of_fields.order), compressed_method ::= irregular (1) } The exact encoding of a control field, and the number of bits it takes up, are determined by the encoding method used by "compressed_method". 5.3.5 crc The "crc" encoding method provides a CRC calculated over a block of data. The block of data is represented using either the "uncomp_value" or "comp_value" attribute of a field. The "crc" method takes a number of parameters: o the number of bits for the CRC (crc_bits); o the bit-pattern for the polynomial (bit_pattern); o the initial value for the CRC register (initial_value); and o the block of data (block_data). I.e.: field ::= crc (num_bits, bit_pattern, initial_value, block_data) The CRC is calculated in LSB order. The following CRC polynomials are defined in RFC 3095 [4], in Sections 5.9.1 and 5.9.2: 8-bit C(x) = x^0 + x^1 + x^2 + x^8 bit_pattern = 0xe0 7-bit C(x) = x^0 + x^1 + x^2 + x^3 + x^6 + x^7 bit_pattern = 0x79 3-bit C(x) = x^0 + x^1 + x^3 bit_pattern = 0x06 For example: crc_field ::= crc (3, 0x6, 0x3) % 3 bits Price, et al. Expires January 16, 2005 [Page 26] Internet-Draft ROHC-FN July 2004 % C(x) = x^0 + x^1 + x^3 5.4 Compound Encoding Methods The encoding methods described above are designed to encode single fields within headers; the compound encoding methods allow the individual fields to be built up into larger structures. The encodings described in this section are used to contain a list of fields and their corresponding encoding method. 5.4.1 single_format The "single_format" encoding method specifies a single fixed encoding for a given kind of protocol header. This is the simplest structured encoding method. For example: test_single_format ::= single_format, { uncompressed_data ::= field_1, % 4 bits field 2, % 4 bits compressed_data ::= field_2, % 0 bit field 1, % 4 bits field_1 ::= irregular (4), field 2 ::= value (4, 9) } This encoding specifies the order of the fields in the uncompressed header, followed by the order of the fields in the compressed header, followed by a list of encoding methods for each field. The compressed data will appear in the order specified by the field order list "compressed_data", with each individual field being encoded in the manner given for that field. Consequently, the length of the compressed data will be the total of the lengths of all the individual fields. The above example would encode "field_2" first (zero bits long), followed by "field_1" (four bits long), giving a total length of four bits. Note that the order of the fields specified in "compressed_data" does not have to match the order they appear in "uncompressed_data". Fields of zero bits length may be omitted from the field order list, since their position in the list is not significant. Note also that the arrangement of fields specified in the uncompressed header is arbitrary. Any arrangement of fields that correctly describes the content of the uncompressed header may be Price, et al. Expires January 16, 2005 [Page 27] Internet-Draft ROHC-FN July 2004 selected - this need not be the same as the one described in the specifications for the protocol header being compressed. For example, there may be a protocol whose header contains a 16 bits sequence number, but whose sessions tend to be short lived. This would mean that the high bits of the sequence number are almost always constant. The list of uncompressed fields could reflect this by splitting the original uncompressed field into two fields, one field to represent the always-zero part of the sequence number, and a second field to represent the significant part. So, without changing the meaning, the above could be written as follow: test_single_format ::= single_format, { uncompressed_data ::= field_1, % 4 bits field 2, % 4 bits compressed_data ::= field 1, % 4 bits field_1 ::= irregular (4), field 2 ::= value (4, 9) } 5.4.2 multiple_formats The "multiple_formats" encoding method specifies multiple encodings for a given header. This allows different compression methods to be used depending on what is the most efficient way of compressing a particular header. For example, a field may have a fixed value most of the time, but the fixed value may occasionnally change. Using the method "single_format", this field would have to be encoded using "irregular" (defined in Section 5.1.2 above), even though the value only changes rarely. However, using the "multiple_formats" encoding, we can provide two alternatives: one encoding for when the value remains fixed and another for when the value changes. This encoding method is notated in a similar way to the "single_format" encoding method (defined in Section 5.4.1); there are however a number of differences. This is the topic of the following sub-sections. 5.4.2.1 Naming Convention The field names used by the "multiple_formats" encoding method differ from those used by the "simple_format" method. This is because while Price, et al. Expires January 16, 2005 [Page 28] Internet-Draft ROHC-FN July 2004 there is still only a single definition of the uncompressed packet format, there are now obviously several alternative compressed packet formats. The field "uncompressed_data" thus becomes "uncompressed_format", and the field "compressed_data" is split into several fields . These fields must be defined using names beginning with "format_", and each name must be unique within its scope. Each of the format fields has a separate set of field encodings associated with it. In particular, each compressed packet format must include a discriminator that uniquely identifies that particular format. See also "discriminator" encoding in Section 5.3.2 above. 5.4.2.2 Structure for Multiple Formats The structure used by the "multiple_formats" encoding method also differs from the "simple_format" method. The field encodings appear as subfields in each compressed packet format. This is necessary to make it explicit which encoding methods are to be used for which compressed packet format. For example: format_0 ::= discriminator, % 1 bit field_1, % 0 bit { discriminator ::= '0', field_1 ::= static } The discriminator must always appear first in the field order list, since the decompressor needs to know what packet format it is dealing with before it can do anything else with the rest of the packet. 5.4.2.3 Default Encoding Methods - default_methods With the "multiple_formats" method, default encoding methods can be specified for each field. The default encoding methods specify the encoding method to use for a field if a given format does not give an encoding method for that field. This is helpful to keep the definition of the packet formats concise, as the same encoding method need not be spelt out for every compressed format. There is no need to specify a field order list for the default encoding methods, since the field order is specified individually for each format, so "..." can be given instead. For example: default_methods ::= ... , { field_1 ::= value (4,1), field_2 ::= value (4,2) Price, et al. Expires January 16, 2005 [Page 29] Internet-Draft ROHC-FN July 2004 } The normal case will be for all default encodings to be compressed to zero bits, in which case they are irrelevant to the compressed field order. However if any default encodings are used which compress to greater than zero bits, their position in the field order list must be specified explicitly for each compressed packet format. 5.4.2.4 Example of Multiple Formats Putting this altogether, here is a complete example of multiple formats: test_multiple_formats ::= multiple_formats, { format_0 ::= discriminator, field_1, { discriminator ::= '0', field_1 ::= static }, format_1 ::= discriminator, field_1, { discriminator ::= '11', field_1 ::= irregular (4) }, uncompressed_format ::= field_1, field_2, default_methods ::= ... , { field_2 ::= value (4,2) } } 5.4.3 encode_list The "encode_list" encoding method is used to perform a certain encoding on each element in a list of elements. This encoding method takes three parameters, the first being an encoding method, the second is the uncompressed list of items, and the third parameter is the number of items to apply the encoding method on. For example: Price, et al. Expires January 16, 2005 [Page 30] Internet-Draft ROHC-FN July 2004 encode_list (irregular(32), csrc_list, 3) will create a list that will contain three fields of 32 bits each. 5.4.4 list_index The "list_index" encoding method is used to create an index item entry (also called an "xi") in a list of item indexes ("xi_list"). The "xi_list" is used with generic compressed lists (see Section 5.4.5). Each index item entry "xi" can be either 4 or 8 bits wide, and the most significant bit of the "xi" indicates the presence of an item in the list of items. The remaining bits in the "xi" (either 3 or 7 bits) indicate an index into a table of items that is stored in the context. For example: list_index(large_xi) ::= alt (large_xi) { guard ::= value (0, 0) { presence ::= choice (2), index ::= choice (8) }, guard ::= value (0, 1) { presence ::= choice (2), index ::= choice (128) } } 5.4.5 generic_comp_list The generic compressed list encoding method, "generic_comp_list", is used to compress a list of items. The method has two parameters. The first parameter, "encoding_methods", is the encoding method that will be applied to each item in the list. The second parameter, "uncomp_list", is the uncompressed list of items. The first octet in the generic compressed list describes the format of the list, and contains the item count ("large_xi") as well as the size of the list of items indexes ("xi_count"). It is followed by the list of item indexes ("xi_list") and the list of compressed items ("item_list"). The list of item indexes ("xi_list") describes what items from the table are present in the uncompressed list, as well as the order of Price, et al. Expires January 16, 2005 [Page 31] Internet-Draft ROHC-FN July 2004 the items. It also indicates which of these items are present in the list of compressed item ("item_list"). If there is an odd number of item indexes ("xi_count"), the list of indexes must be padded to provide octet-alignment. The number of items in the list of compressed item is indicated by the number of presence bits that have been set in "xi_list". Note that the generic compressed list is bitwise identical to the one used for list compression in RFC-3095 [4], except that the generation field is not used and therefore the "GP" bit (RFC-3095 [4], section 5.8.6.1.) has been reserved and set to zero. The mapping between uncompressed items and indexes in the table is done by the compressor. The compressor must also choose 1) what items must be present in the list of compressed item ("item_list"), and 2) what items need only be present in the list of item indexes ("xi_list"). Whether the compressor uses the small or large format for the number of item indexes ("large_xi") should be based on the largest value used by any item index (see the "list_index" encoding method in Section 5.4.4). All indexes must use the same format. In particular, if one item index needs to use the large format, then all indexes must use that format. For example: Price, et al. Expires January 16, 2005 [Page 32] Internet-Draft ROHC-FN July 2004 generic_comp_list (encoding_methods, uncomp_list) ::= reserved_1, large_xi, reserved_2, xi_count, xi_list, item_list, { reserved_1 ::= reserved (0, 2), large_xi ::= choice (2), % called PS in RFC 3095 reserved2 ::= reserved(0, 1), xi_count ::= choice(16), % called CC in RFC 3095 xi_list ::= encode_list (list_index (large_xi), uncomp_list, xi_count), xi_padding ::= alt (expression(mod(xi_count, 2))) { guard ::= value (0) { field_value ::= value (0, 0) }, guard ::= value (1) { field_value ::= reserved (0, 4) } } item_list ::= encode_list (encoding_methods, uncomp_list, xi_count) } Note that "large_xi" corresponds to "PS" and "xi_count" corresponds to "CC" in RFC 3095 [4] . 5.5 User-defined Encoding Methods All the encoding methods defined above are predefined, however, it is possible to define new encoding methods in terms of other encoding methods. 5.5.1 Syntax A user-defined encoding method is defined using very similar syntax to that used to encode a field. The only difference is that instead of a field name, the name of the user-defined encoding method is given, enclosed by two "at" signs (i.e. "@"): Price, et al. Expires January 16, 2005 [Page 33] Internet-Draft ROHC-FN July 2004 @user_defined@ ::= encoding_method A user-defined encoding method, thus defined, can then be used in the same way as any other encoding method: field ::= user_defined For example, for some profiles where particular fields are often absent from packet headers, it may be useful to define a "null" encoding method, which is shorthand for a field of zero width: @null@ ::= value(0, 0), : : eg_field ::= null, 5.5.2 Parameters Unbound parameters in the encoding method used, become parameters to the user defined encoding method. In the below example, the first parameter of the "value" encoding method is fixed at 4, so it does not appear in the parameter list of the user defined encoding method, "four_bit_field". However the second parameter, "four_bit_value_param", becomes a parameter of the user defined encoding method, since its value is not bound at the point of definition of the user-defined encoding method. @four_bit_field@ ::= value(4, four_bit_value_param), : : eg_field ::= four_bit_field(7), 5.5.3 Subfields Any encoding method may be used to define the user-defined encoding method, including those that make use of subfields. These behave in the same way as parameters, in that if they are left unbound, they become subfields of the user-defined encoding method: Price, et al. Expires January 16, 2005 [Page 34] Internet-Draft ROHC-FN July 2004 @user_defined@ ::= encoding_method(1, user_defined_param), { field_1 ::= static, field_2 ::= value (16, user_defined_param), field_3 ::= user_defined_field, } : : field ::= user_defined (4), { field_3 ::= 2 } Note that out of the three fields above, only "field_3" is visible when using the user-defined encoding method, since the other two fields are bound to encodings at the point of definition. Note also that the encoding of "field_2" is dependent on the unbound parameter "user_defined_param", the value of which is supplied as a parameter to the user-defined encoding method. 5.6 Chain Items <# This seems a good place to include a description of how #> <# chain items are defined using the formal notation. TBW #> 6. Security considerations This draft describes a formal notation similar to ABNF RFC 2234 [3], and hence is not believed to raise any security issues. 7. Acknowledgements A number of important concepts and ideas have been borrowed from ROHC RFC 3095 [4]. Thanks to Mark West and Kristofer Sandlund for their cooperation and feedback from notating the TCP profile. Thanks to Rob Hancock and Stephen McCann for putting up with the authors' arguments and making helpful suggestions, frequently against the tide! The authors would also like to thank Carsten Bormann, Christian Schmidt, Qian Zhang, Hongbin Liao, Max Riegel and Lars-Erik Jonsson for their comments and encouragement. We haven't always agreed, but Price, et al. Expires January 16, 2005 [Page 35] Internet-Draft ROHC-FN July 2004 the arguments have been fun! 8 References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Crocker, D. and P. Overall, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. [4] Bormann, C., Burmeister, C., Degermark, M., Fukushima, H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust Header Compression (ROHC): Framework and four profiles: RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001. Authors' Addresses Richard Price Siemens/Roke Manor Roke Manor Research Ltd. Romsey, Hants SO51 0ZN UK Phone: +44 (0)1794 833681 EMail: richard.price@roke.co.uk URI: http://www.roke.co.uk Robert Finking Siemens/Roke Manor Roke Manor Research Ltd. Romsey, Hants SO51 0ZN UK Phone: +44 (0)1794 833189 EMail: robert.finking@roke.co.uk URI: http://www.roke.co.uk Price, et al. Expires January 16, 2005 [Page 36] Internet-Draft ROHC-FN July 2004 Ghyslain Pelletier Ericsson AB Box 920 Lulea SE-971 28 Sweden Phone: +46 (0) 8 404 29 43 EMail: ghyslain.pelletier@ericsson.com Appendix A. Bit-level Worked Example This section gives a worked example at the bit level, showing how a simple profile describes the compression of real data from an imaginary packet format. The example used has been kept fairly simple, whilst still aiming to illustrate some of the intricacies that arise in use of the notation. In particular fields have been kept short to make it possible to read the binary representation of the headers by eye, without too much difficulty. A.1 Example Packet Format Our imaginary header is just 16 bits long, and consists of the following fields: 1. version number - 2 bits 2. type - 2 bits 3. flow id - 4 bits 4. sequence number - 4 bits 5. flag bits - 4 bits So for example 0101000100010000 indicates a packet with a version number of one, a type of one, a flow id of one, a sequence number of one, and all flag bits set to zero. A.2 Initial Encoding An initial definition based solely on the above information is: Price, et al. Expires January 16, 2005 [Page 37] Internet-Draft ROHC-FN July 2004 eg_header ::= single_packet_format, { uncompressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits flag_bits, % 4 bits compressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits flag_bits, % 4 bits version_no ::= irregular(2), type ::= irregular(2), flow_id ::= irregular(4), sequence_no ::= irregular(4), flag_bits ::= irregular(4) } This defines the packet nicely, but doesn't actually offer any compression. If we use it to encode the above header, we get: Uncompressed header: 0101000100010000 Compressed header: 0101000100010000 This is because we have stated that all fields are irregular - i.e. we don't know anything about their behaviour. A.3 Basic Compression In order to achieve any compression we need to notate our knowledge about the header, and it's behaviour in a flow. For example, we may know the following facts about the header: 1. version number - indicates which version of the protocol this is, always one for this version of the protocol 2. type - may take any value. 3. flow id - may take any value. 4. sequence number - make take any value 5. flag bits - contains three flags, a, b and c, each of which may be set or clear, and a reserved flag bit, which is always clear (i.e. zero). We could notate this knowledge as follows: Price, et al. Expires January 16, 2005 [Page 38] Internet-Draft ROHC-FN July 2004 eg_header ::= single_packet_format, { uncompressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 1 bit compressed_data ::= version_no, % 0 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 0 bits version_no ::= value(2,1), type ::= irregular(2), flow_id ::= irregular(4), sequence_no ::= irregular(4), abc_flag_bits ::= irregular(3), reserved_flag ::= value(1,0) } Using this simple scheme, we have successfully encoded the fact that one of the fields has a permanently fixed value of one, and therefore contains no useful information. We have also encoded the fact that the final flag bit is always zero, which again contains no useful information. Both of these facts have been notated using the value encoding method (see Section 5.1.1) Note that we could just as well have omitted the "0 bits" fields from the definition of the compressed_data if we so wished, since the only purpose of that list is to indicate the order in the compressed header - zero bit fields don't actually appear and so can be ommitted. Using this new encoding on the above header, we get: Uncompressed header: 0101000100010000 Compressed header: 0100010001000 Which reduces the amount of data we need to transmit by roughly 20%. However, this encoding fails to take any advantage of relationships between values of a field in one packet and its value in subsequent packets. For example, every packet in the following sequence is compressed the same amount despite the similarities between them: Price, et al. Expires January 16, 2005 [Page 39] Internet-Draft ROHC-FN July 2004 Uncompressed header: 0101000100010000 Compressed header: 0100010001000 Uncompressed header: 0101000100100000 Compressed header: 0100010010000 Uncompressed header: 0111000100110000 Compressed header: 1100010011000 A.4 Inter-packet compression The profile we have defined so far has not compressed the sequence number or flow ID fields at all, since they can take any value. However the value of these fields in one header has a very simple relationship to their value in previous headers: the sequence number increases by one each time, the flow_id stays the same, it always has the same value that it did in the previous header in the flow, the abc_flag_bits stay the same most of the time, they usually have the same value that they did in the previous header in the flow, An obvious way of notating this is as follows: Price, et al. Expires January 16, 2005 [Page 40] Internet-Draft ROHC-FN July 2004 % This obvious encoding will not work (correct encoding below) eg_header ::= single_packet_format, { uncompressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 1 bit compressed_data ::= version_no, % 0 bits type, % 2 bits flow_id, % 0 bits sequence_no, % 0 bits abc_flag_bits, % 3 bits reserved_flag, % 0 bits version_no ::= value(2,1), type ::= irregular(2), flow_id ::= static, sequence_no ::= lsb(0,-1), abc_flag_bits ::= irregular(3), reserved_flag ::= value(1,0) } This dependency on previous packets is notated using the static and LSB encoding methods (see Section 5.1.3 and Section 5.1.4 respectively). However there are a few problems with the above notation. Firstly, and most importantly, the flow_id field is notated as "static" which means that it doesn't change from packet to packet. However, the notation does not indicate how to communicate the value of the field initially. It's all very well saying "it's the same value as last time", but there must have been a first time, where we define what that value is, so that it can be referred back to. The above notation provides no way of communicating that. Similarly with the sequence number - there needs to be a way of communicating its initial value. Secondly, the sequence number field is communicated very efficiently in zero bits, but it is not at all robust against packet loss. If a packet is lost then there is no way to fill in the missing sequence number. Finally, although the flag bits are usuallly the same as in the previous header in the flow, the profile doesn't make any use of this fact; since they are sometimes not the same as those in the previous Price, et al. Expires January 16, 2005 [Page 41] Internet-Draft ROHC-FN July 2004 header, it is not safe to say that they are always the same, so static encoding can't be used all the time. We solve all three of these problems below, robustness first, since it is simplest. When communicating sequence numbers a very important consideration for the notator is how robust the compressed protocol needs to be against packet loss. This will vary a lot from protocol to protocol. For example RTP has a high setup cost, so the compressed stream needs to be robust against fairly high packet loss. Things are different with TCP, where robustness to loss of just a few packets is sufficient. For the example protocol we'll assume short, low overhead flows and say we need to be robust to the loss of just one packet, which we can achieve with a single bit of LSB encoding (see Section 5.1.4 ). To communicate initial values for the sequence number and flow ID fields, and to take advantage of the fact that the flag bits are usually the same as in the previous header, we need to depart from the single packet format encoding we are currently using (see Section 5.4.1) and instead use multiple packet formats (see Section 5.4.2) : eg_header ::= multiple_formats, { uncompressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 1 bit format_0 ::= discriminator, % 1 bit version_no, % 0 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 0 bits { discriminator ::= '0', version_no ::= value(2,1), type ::= irregular(2), flow_id ::= irregular(4), sequence_no ::= irregular(4), abc_flag_bits ::= irregular(3), reserved_flag ::= value(1,0) }, format_1 ::= discriminator, % 1 bit Price, et al. Expires January 16, 2005 [Page 42] Internet-Draft ROHC-FN July 2004 version_no, % 0 bits type, % 2 bits flow_id, % 0 bits sequence_no, % 1 bits abc_flag_bits, % 3 bits reserved_flag, % 0 bits { discriminator ::= '1', version_no ::= value(2,1), type ::= irregular(2), flow_id ::= static, sequence_no ::= lsb(1,-1), abc_flag_bits ::= static, reserved_flag ::= value(1,0) } } Figure 71 Note that we have had to add a discriminator field, so that the decompressor can tell which packet format has been used by the compressor. The format with a static flow ID and LSB encoded sequence number, is now 4 bits long, less than a third of the size of the single packet format, and a quarter of the size of the uncompressed header. However, the original packet format (with an irregular flow ID and sequence number) has also grown by one bit, due to the addition of the discriminator. An important consideration when creating multiple packet formats is whether the extra format occurs frequently enough that the average compressed header length is shorter as a result. For example, if in fact the sequence number in the example protocol counted up in steps of three, not one, then the LSB encoding could never be used; all we would have just achieved is to lengthen the irregular packet format by one bit. Using the above notation, we now get: Uncompressed header: 0101000100010000 Compressed header: 00100010001000 Uncompressed header: 0101000100100000 Compressed header: 1010 ; 00100010010000 Uncompressed header: 0111000100110000 Price, et al. Expires January 16, 2005 [Page 43] Internet-Draft ROHC-FN July 2004 Compressed header: 1110 ; 01100010011000 The first header in the stream is compressed the same way as before, except that it now has the extra 1 bit discriminator at the start (0). When a second header arrives, with the same flow ID as the first and its sequence number one higher, it can now be compressed in two possible ways, either using format_1 or in the same way as previously, using format_0. Note that we show all possible encodings of a packet as defined by a given profile, separated by semi-colons. Either of the above encodings for the packet could be produced by a valid implementation, although of course a good implementation would always aim to make the compressed size as small as possible and an optimum implementation would pick the encoding which led to the best compression of the packet stream (which is not necessarily the smallest encoding for a particular packet). A.5 Variable Length Discriminators Suppose we do some analysis on flows of our example protocol and discover that whilst it is usual for successive packets to have the same flags, on the occasions when they don't, the packet is almost always a "flags set" packet, in which all three of the abc flags are set. To encode the flow more efficiently a packet format needs to be written to reflect this. This now gives a total of three packet formats, which means we need three discriminators to differentiate between them. The obvious solution here is to increase the number of bits in the discriminator from 1 to two and for example use discriminators 00, 01, and 10. However we can do slightly better than this. Any uniquely identifiable discriminator will suffice, so we can use 00, 01 and 1. If the discriminator starts with 1, that's the whole thing. If it starts with 0 the decompressor knows it has to check one more bit to determine the packet kind. Note that it would be erroneous to use e.g. 0, 01 and 10 as discriminators since after reading an initial 0, the decompressor would have no way of knowing if the next bit was a second bit of discriminator, or the first bit of the next field in the packet stream. 0, 10 and 11 however would be OK as the first bit again indicates whether or not there are further discriminator bits to follow. This gives us the following: Price, et al. Expires January 16, 2005 [Page 44] Internet-Draft ROHC-FN July 2004 eg_header ::= multiple_formats, { uncompressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 1 bit format_0 ::= discriminator, % 1 bit version_no, % 0 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 0 bits { discriminator ::= '00', version_no ::= value(2,1), type ::= irregular(2), flow_id ::= irregular(4), sequence_no ::= irregular(4), abc_flag_bits ::= irregular(3), reserved_flag ::= value(1,0) }, format_1 ::= discriminator, % 1 bit version_no, % 0 bits type, % 2 bits flow_id, % 0 bits sequence_no, % 1 bits abc_flag_bits, % 3 bits reserved_flag, % 0 bits { discriminator ::= '01', version_no ::= value(2,1), type ::= irregular(2), flow_id ::= static, sequence_no ::= lsb(1,-1), abc_flag_bits ::= value(3,7), reserved_flag ::= value(1,0) }, format_2 ::= discriminator, % 1 bit version_no, % 0 bits type, % 2 bits flow_id, % 0 bits sequence_no, % 1 bits Price, et al. Expires January 16, 2005 [Page 45] Internet-Draft ROHC-FN July 2004 abc_flag_bits, % 3 bits reserved_flag, % 0 bits { discriminator ::= '1', version_no ::= value(2,1), type ::= irregular(2), flow_id ::= static, sequence_no ::= lsb(1,-1), abc_flag_bits ::= static, reserved_flag ::= value(1,0) } Here is some example output: Uncompressed header: 0101000100010000 Compressed header: 000100010001000 Uncompressed header: 0101000100100000 Compressed header: 1010 ; 000100010010000 Uncompressed header: 0111000100110000 Compressed header: 1110 ; 001100010011000 Uncompressed header: 0111000101001110 Compressed header: 01110 ; 001100010100111 Here we have a very similar sequence to last time, except that there is now an extra message on the end which has the flag bits set. The encoding for the first message in the stream is now one bit larger, the encoding for the next two messages is the same as before, since that packet format has not grown, thanks to the use of variable length discriminators. Finally the packet that comes through with all the flag bits set can be encoded in just five bits, only one bit more than the most common packet format. A.6 Default encoding There is some redundancy in the notation used to define the profile in that the same encoding method is used for the same fields several times in different formats, but the field is redefined explicitly each time. If the encoding for any of these fields changed in the future (e.g. if the reserved flag became permanently set to 1 instead of 0), then every packet format would have to be modified to reflect this change. Price, et al. Expires January 16, 2005 [Page 46] Internet-Draft ROHC-FN July 2004 This problem can be avoided by specifying a default encoding for these fields, which also leads to a more concisely notated profile: eg_header ::= multiple_formats, { uncompressed_data ::= version_no, % 2 bits type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits reserved_flag, % 1 bit format_0 ::= discriminator, % 1 bit type, % 2 bits flow_id, % 4 bits sequence_no, % 4 bits abc_flag_bits, % 3 bits { discriminator ::= '00', type ::= irregular(2), flow_id ::= irregular(4), sequence_no ::= irregular(4), abc_flag_bits ::= irregular(3), }, format_1 ::= discriminator, % 1 bit type, % 2 bits sequence_no, % 1 bits abc_flag_bits, % 3 bits { discriminator ::= '01', type ::= irregular(2), sequence_no ::= lsb(1,-1), abc_flag_bits ::= value(3,7), }, format_2 ::= discriminator, % 1 bit type, % 2 bits sequence_no, % 1 bits abc_flag_bits, % 3 bits { discriminator ::= '1', type ::= irregular(2), sequence_no ::= lsb(1,-1), abc_flag_bits ::= static } Price, et al. Expires January 16, 2005 [Page 47] Internet-Draft ROHC-FN July 2004 default_methods ::= ... , { version_no ::= value(2,1), flow_id ::= static, reserved_flag ::= value(1,0) } } The above profile behaves in exactly the same way as the one notated previously, since it has the same meaning. Note that the profile has also been made more concise by not specifying zero length fields in the field order list of compressed formats. Price, et al. Expires January 16, 2005 [Page 48] Internet-Draft ROHC-FN July 2004 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION Price, et al. Expires January 16, 2005 [Page 49] Internet-Draft ROHC-FN July 2004 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Price, et al. Expires January 16, 2005 [Page 50]