Network Working Group Richard Price, Siemens/Roke Manor INTERNET-DRAFT Abigail Surtees, Siemens/Roke Manor Expires: April 2003 Mark A West, Siemens/Roke Manor October 28, 2002 Protocol-Enabled BNF-Based LanguagE (PEBBLE) Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is a submission of the IETF ROHC WG. Comments should be directed to its mailing list, rohc@ietf.org. Abstract This draft defines PEBBLE: a formal notation for describing the behavior of protocols or protocol stacks. The notation is designed to provide information useful in "protocol-aware" compression algorithms, for example the header compression profiles compatible with the Robust Header Compression (ROHC) framework. Price et al. [Page 1] INTERNET-DRAFT PEBBLE October 28, 2002 Table of contents 1. Introduction..................................................2 2. Terminology...................................................2 3. Overview of PEBBLE............................................3 4. Normative definition of PEBBLE................................7 5. Improving readability.........................................15 6. Pre-defined rules and integer operators.......................17 7. Library of commonly used rules................................22 8. Security considerations.......................................34 9. Acknowledgements..............................................34 10. Authors' addresses............................................34 11. References....................................................35 Appendix A. ABNF description of PEBBLE............................35 1. Introduction This document defines PEBBLE: a version of the well-known Backus-Naur Form (BNF) notation which can be used to describe the behavior of protocols or protocol stacks. It is intended to simplify the creation of new compression profiles to fit within the ROHC [RFC-3095] framework. The notation in itself does not define the "bits on the wire" of the compressed data. To achieve this an encoding algorithm must be chosen that can map the uncompressed packets onto a set of compressed packets and vice versa. The specification of such an encoder is beyond the scope of this draft: each ROHC profile created using PEBBLE can select its own based on the requirements of the profile. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC-2119]. BNF (Backus Naur Form) BNF is a notation used to describe the syntax of many well-known protocols. Control field A control field is a field described by PEBBLE but which is not part of the protocol header itself. For example, PEBBLE may define a checksum field over the header to ensure robustness against bit errors and dropped packets. Price et al. [Page 2] INTERNET-DRAFT PEBBLE October 28, 2002 Encoding algorithm An encoding algorithm defines a reversible mapping between a set of uncompressed packets and a set of compressed packets. Protocol-aware encoding algorithms can make use of the information provided in the PEBBLE description of a protocol. Field The PEBBLE description of a protocol divides the protocol into a set of contiguous bit patterns known as fields. Label Each field can be assigned a "label name" which allows it to be referenced by different PEBBLE rules. Library of rules The library of rules contains a number of commonly used rules for describing the behavior of a protocol. More rules can be added to the library as and when they are needed. Pre-defined rules Four pre-defined rules are provided as standard by PEBBLE; these can be combined to create new rules with more complex behavior. Profile A ROHC [RFC-3095] profile is a description of how to compress a certain protocol stack over a certain type of link. Each profile includes an encoding algorithm to compress the headers and a state machine to control the actions of each endpoint. Rule Rules are the basis for the PEBBLE description of a protocol. Each rule defines the behavior of one or more fields in the protocol header. Rules can be taken from the set of pre-defined rules or can be defined in terms of existing rules. 3. Overview of PEBBLE This chapter gives an overview of the PEBBLE notation, including its relation to existing versions of BNF such as "Augmented BNF" (ABNF) [RFC-2234]. Price et al. [Page 3] INTERNET-DRAFT PEBBLE October 28, 2002 3.1. What is BNF? BNF (Backus Naur Form) is a notation used to describe the syntax of many well-known protocols, for example SIP [RFC-3261] and RTSP [RFC- 2960]. All versions of BNF are based around the concept of a "BNF rule". Each BNF rule describes the syntax of a portion of the protocol, and can be defined in terms of existing BNF rules which makes it easy to build new descriptions out of existing ones (e.g. when adding new layers to a protocol stack). The primary or "top- level" BNF rule describes the syntax of the entire protocol stack in question. When the BNF rule which describes the syntax of a certain protocol stack is applied to a packet that may or may not contain a header from the protocol stack, there are two possible outcomes: 1. The packet successfully matches the BNF rule, indicating that it contains a valid header from the corresponding protocol stack. 2. The packet fails to match the BNF rule, indicating that it does not contain a header from the corresponding protocol stack. So each BNF rule can be considered to be an operator, whose input is a packet and whose output is a Boolean value specifying whether the packet contains a header from the chosen protocol stack. In order to create new BNF rules from existing rules, it is necessary to provide at least one pre-defined BNF rule as part of the notation. Different variants of BNF offer different sets of pre-defined BNF rules from which new rules can be constructed. For example, the variant of BNF known as "Augmented BNF" [RFC-2234] includes the following pre-defined rules: rule_1 rule_2 ... rule_n Concatenation of n BNF rules rule_1 | rule_2 | ... | rule_n Choice between n BNF rules "%d" m-n Character with ASCII code between m and n inclusive [rule] Optional BNF rule x*y rule List of between x and y occurrences of a BNF rule Note that each of the pre-defined BNF rules have operands that can affect the outcome of the rule. For example, "x*y rule" matches n consecutive instances of an existing BNF rule where n lies between the integers x and y inclusive. So if the BNF rule "foo" matches the bit patterns "00" or "11" then the BNF rule 1*2(foo) will match the bit patterns "00", "11", "0000", "0011", "1100" or "1111". Price et al. [Page 4] INTERNET-DRAFT PEBBLE October 28, 2002 New BNF rules can be created by assigning a name for the rule, followed by the symbol "::=" and an expression which evaluates to the new rule and is written in terms of existing rules. For example: bar ::= 1*2(foo) The following BNF description illustrates this process in more detail. Using the pre-defined BNF rules from ABNF [RFC-2234], a new BNF rule "list" is created which describes a list of arbitrary positive or negative integers: list ::= 1*(number) number ::= ["+" | "-"] 1*(digit) digit ::= %d48-57 3.2. What is PEBBLE? PEBBLE is a version of BNF designed to be useful when creating "protocol-aware" compression algorithms. Knowledge of the protocol to be compressed typically improves the compression ratio and reduces the processing and memory overhead compared to the generic "protocol- unaware" case. Note that PEBBLE describes the behavior of the uncompressed headers only, so descriptions can be written without requiring any knowledge of the algorithm that will generate the actual compressed data. In particular the notation is designed to be both human-readable and machine-readable. If only one protocol stack needs to be compressed, a suitable encoder can be designed by hand based on the PEBBLE description of the header. Conversely, it is possible to implement a protocol-independent encoder that can take an arbitrary PEBBLE description and automatically generate a set of compressed headers for the relevant protocol. The PEBBLE notation is a version of BNF with four pre-defined rules: FIELD, DISTRIBUTION, MAP and CONCATENATION (see Chapter 6 for a detailed description of these rules). PEBBLE also includes the following unique features not found in simpler versions of BNF: 1. As well as describing the set of valid headers for the chosen protocol, PEBBLE also outputs some additional information that may be useful to encoders. 2. New rules can be created with operands that modify the outcome of the rule. Each of these features is discussed in more detail in the following sections. Price et al. [Page 5] INTERNET-DRAFT PEBBLE October 28, 2002 3.3. Useful information for encoders The rules defined by PEBBLE behave in the same manner as ordinary BNF rules: the input is a packet (and optionally some additional operands that affect the outcome of the rule), and the output is a Boolean value specifying whether the packet contains a valid header from the chosen protocol stack. However the rules provided by PEBBLE also output some additional information that may be useful to encoders. As an example, for each header that matches a certain PEBBLE rule, the rule also outputs the probability that the header will occur in a typical data flow. This extra information allows an encoding algorithm to encode more common headers using fewer bits than headers expected to occur rarely. The following diagram shows the inputs and outputs for a PEBBLE rule: +-----------------------+ +-----------------------+ | packet to be tested | PEBBLE rule | valid/invalid header? | +-----------------------+ --------------> +-----------------------+ | operands for the rule | | useful information | +-----------------------+ +-----------------------+ Figure 1: Inputs and outputs for a PEBBLE rule More generally, the aim of the PEBBLE notation is to provide sufficient information to support a wide variety of encoders. Particular encoders are of course at liberty to ignore this information if they wish. If it is not clear whether or not a piece of information will be useful, it is generally considered that it should be included since it is easier to ignore unnecessary information than to infer missing information. 3.4. Creating new rules in PEBBLE All versions of BNF allow new rules to be defined in terms of existing rules. However, most variants of BNF only allow simple rules with no operands to be defined in terms of existing rules; BNF rules with one or more operands must be pre-defined by hand. Since the PEBBLE notation provides a large library of rules for describing new protocols, it is useful to extend the mechanism for creating new rules so that it can handle rules with operands. As a result the description of rules in the library is greatly simplified. For example, suppose that a rule RANGE(#n,#v,#k) has already been defined which describes a single n-bit field that can take values from v to v+k-1 inclusive. The RANGE rule can be used to create a new rule called VALUE with two integer operands n and v: Price et al. [Page 6] INTERNET-DRAFT PEBBLE October 28, 2002 VALUE(#n,#v) ::= RANGE(n,v,1) The VALUE rule describes an n-bit field that always takes the value v. In turn, this rule is useful for describing fields such as the IP Version: IPv4_Version ::= VALUE(4,4) The mechanism for defining new rules in terms of existing rules forms the basis of PEBBLE, and is discussed in detail in the following chapter. 4. Normative definition of PEBBLE This chapter defines the PEBBLE notation for describing the behavior of protocols or protocol stacks. 4.1. Inputs and outputs of a rule As explained in Section 3.1, a rule can be considered to be an operator which takes a packet as its input and returns a Boolean value specifying whether or not the packet contains a valid header. In PEBBLE each rule can also take one or more additional operands that modify its behavior, and can return extra information that may be of use to certain encoders. As well as the operands declared explicitly when the rule is created, there are three "implicit operands" common to all rules. Implicit operands do not need to be declared explicitly because they are always required by a rule. The first implicit operand is the packet to be tested for a valid header; the other two implicit operands (the context and the list of labels) are described in Section 4.4 and Section 4.5 respectively. As well as the Boolean value indicating whether the packet contains a valid header or not, each rule provides three outputs that may be useful to encoders. The first is the probability that the header will occur in a typical flow, which is discussed in greater detail in Section 4.3. The other two outputs (an updated context and an updated list of labels) are described in Section 4.4 and Section 4.5 respectively. Note that if the packet does not contain a valid header then the remaining three outputs are not provided. In summary, a complete list of the inputs and outputs for each rule is given below: Inputs: Packet to be tested for a valid header Operands explicitly declared by the rule Length and value for fields in the context List of labels and their values Price et al. [Page 7] INTERNET-DRAFT PEBBLE October 28, 2002 Outputs: Boolean value stating if the packet contains a valid header Probability value that header will occur in a typical flow Updated length and value for fields in the context Updated list of labels and their values 4.2. Creating new PEBBLE rules In PEBBLE the behavior of a protocol is defined by a single "primary" rule, which may itself be defined in terms of simpler rules. For example: IPv6_Header ::= Version Traffic_Class ECT_Flag CE_Flag Flow_Label Payload_Length Next_Header Hop_Limit Source_Address Destination_Address The general format for creating a new rule in terms of existing rules is as follows: new_rule whitespace "::=" whitespace expression The term "new_rule" describes the name of the new rule together with the names of any operands needed by the rule. The term "expression" refers to an expression which evaluates to the desired behavior of the new rule. The expression can include one or more existing rules and integer operators, as well as any of the operands declared for the new rule. For example, using the RANGE rule discussed previously it is possible to create a new IRREGULAR rule that describes a single n-bit field capable of taking all values from 0 to 2^n-1 inclusive: IRREGULAR(#n) ::= RANGE(n,0,2^n) The syntax of each of these terms is described in detail in the following sections. 4.2.1. Syntax of rule names and operands When creating a new rule in PEBBLE it is necessary to give a name for the new rule, and to declare any operands that are required by the rule. Price et al. [Page 8] INTERNET-DRAFT PEBBLE October 28, 2002 The name of the new rule can be any combination of letters, digits or the symbol "_", provided that it doesn't coincide with the name of an existing rule or integer operator. Names are case-sensitive. The name of the new rule is followed by a list of the operands expected by the rule (enclosed in parentheses and separated by commas). A name is assigned to each operand so that it can be used in the expression which defines the new rule (see Section 4.2.2 for more on this). Operand names are also a combination of letters, digits and the symbol "_". Operand names within the definition of a rule should be unique, and should not overlap with the names of any existing rules or integer operators. However, the operand names from one rule can be reused when defining a different rule. Three types of operand are available: rules, text names and (signed) integers. Text name operands are preceded by a "$" whilst integer operands are preceded by a "#". If no symbol precedes the operand then it is assumed to be a rule operand. For example: RANGE(#n,#v,#k) IRREGULAR(#n) INFERRED_OFFSET(#n,$base) LOCAL($label,rule) It is also possible to declare a list of operands by appending a pair of square braces to the operand name. If the square braces enclose the name of a previously declared integer operand k then the list of operands must contain entries from 0 to k-1 inclusive. For example, RULE(#n,#list[n]) ::= ... creates a new rule which requires a single integer operand n followed by n integer operands list[0] to list[n-1]. 4.2.2. Syntax of expressions The behavior of a new rule is defined by giving an expression in terms of one or more existing rules and integer operators. For example, using the IRREGULAR rule defined above it is possible to create the more complex UNCOMPRESSIBLE rule as follows: UNCOMPRESSIBLE($label,#d,#m,#p) ::= IRREGULAR(floor(label,d)*m+p) If the overall expression includes one or more existing rules then values must be given for any operands needed by the rules. These values can in turn be defined using expressions. For example, when defining the UNCOMPRESSIBLE rule in terms of the IRREGULAR rule the expression "floor(label,d)*m+p" gives the value of the operand n needed by the IRREGULAR rule. Expressions can include any of the operands that have been declared for the new rule. If the operand is a list then it must be followed Price et al. [Page 9] INTERNET-DRAFT PEBBLE October 28, 2002 by an integer expression (enclosed in square braces) that specifies which value from the list is to be used. For example, the following BRANCH rule applies one of n existing rules depending on the value contained in the operand "indicator": BRANCH(#indicator,#n,rule[n]) ::= rule[indicator] The set of integer operators available to PEBBLE is described in Section 6.1, and the set of four pre-defined rules is described in Section 6.2. For simplicity, PEBBLE does not currently provide any operators which generate new text names from existing ones. The only way to specify a text name is to write the name explicitly or to use a text name operand. 4.2.3. Generating lists of expressions In some cases it is useful to specify a list of one or more values in a single step. This can be accomplished by specifying an expression in terms of one or more indices, which are then declared together with their range. The list of values is generated by allowing each of the indices to take every value in the specified range. The expression for generating a list of values is enclosed in square braces, and the indices are declared using the keywords "for" and "in" as illustrated below: [2*j for j in 0..5] [rule[j] for j in 0..4] [k-j for j in 0..3, k in 0..3] The lower bound for the range is inclusive whereas the upper bound is exclusive, so the expression "for j in 0..5" declares that j takes the values 0, 1, 2, 3 and 4. Suppose that the n indices are j[0],...,j[n-1]. Note that the range of the ith index j[i] can be defined in terms of any of the indices j[0] to j[i-1]. For example: [k for j in 0..4, k in 0..j] The expression to be evaluated can include any of the n indices, so it can be considered to be a function of j[0],...,j[n-1]. Denote the output of this function by list_item[j[0],...,j[n-1]]. The function is then evaluated by replacing each index j[i] with all of the integers within the range of the index. The final output of the list expression is equivalent to writing a list of these values separated by commas. Price et al. [Page 10] INTERNET-DRAFT PEBBLE October 28, 2002 To define the order in which each item appears in the list, it is specified that list_item[j[0],...,j[n-1]] precedes list_item[k[0],...,k[n-1]] whenever there is an integer i such that j[0] = k[0], j[1] = k[1] up to j[i-1] = k[i-1], and j[i] < k[i]. If any of the indices is given a range containing 0 values then the resulting list must be empty. Some example lists of expressions are given below, using both the square brace form and an expanded form that evaluates to give the same result: Square brace form: Equivalent expanded form: [2*j for j in 0..5] 0,2,4,6,8 [rule[j] for j in 0..4] rule[0],rule[1],rule[2],rule[3] [k-j for j in 0..3, k in 0..3] 0,1,2,-1,0,1,-2,-1,0 [k for j in 0..4, k in 0..j] 0,0,1,0,1,2 4.3. Probability distribution and robustness One of the pieces of information provided by PEBBLE is a probability distribution for the headers in the chosen protocol. For a given protocol header, the PEBBLE rule describing the protocol outputs the probability that the header will occur in a typical flow. This information can be useful to encoders, as more common headers can be encoded using fewer bits than headers expected to occur rarely. Note that the choice of how to partition a set of headers into flows must be made when writing the description of the protocol, and the resulting description will reflect this choice. For example, if flow classification is expected to be performed using IP addresses and ports then these fields will take a fixed value with probability 100% in the PEBBLE description of the protocol. The probability distribution should take into account not only the behavior of the headers themselves but also the characteristics of the link over which they are transmitted. For example, if the link can suffer from dropped or misordered packets then the behavior of certain header fields may change as a result. In particular a sequence number field that increases monotonically over a reliable link may occasionally jump forwards in the presence of packet drops. In general a wide range of possible error conditions may be introduced by the link: dropped, misordered or duplicated packets, isolated bit errors, fades etc. From the perspective of PEBBLE however, all error conditions can be classed as either "expected errors" or "unexpected errors". An expected error occurs regularly in the header flow, and hence is taken into account in the PEBBLE description of the protocol. As a Price et al. [Page 11] INTERNET-DRAFT PEBBLE October 28, 2002 result, an encoding algorithm based on the protocol description will be robust against the error (i.e. headers subject to the error can still be successfully decoded). ROHC profiles typically class packet drops or misordering up to a certain limit as expected errors. An unexpected error only occurs for a negligible proportion of headers in the flow, and so does not need to be taken into account when describing the behavior of the protocol. As a result, an encoding algorithm based on the protocol description may fail to decode any headers subject to an unexpected error. ROHC profiles typically class bit errors and excessive packet drops or misordering as unexpected errors. 4.4. Context values The headers generated by a typical protocol stack will change over time, so the behavior of one header often depends on the behavior of previous headers output by the stack. For example, the RTP Sequence Number field increases by 1 for each consecutive header in an RTP stream. PEBBLE takes into account the dependency between successive headers by storing a "context" of field values from each header. These values can then be used when describing the next header in the flow. The context is accessed using the integer operators context_length(#n) and context_value(#n), which return the length and value of the nth field in the context. Rules can also update the context by defining updated_length(#n) and updated_value(#n), which specify the updated length and value of the nth field in the context. These updated lists of values are used as the context for the next header in the flow. Updated values for the context are defined as part of the FIELD rule (see Chapter 6 for further details). Note that as discussed in Section 4.3, when a header flow contains an unexpected error then it is acceptable for an encoder to fail to reconstruct the relevant header. However, it is not acceptable for the decoding of subsequent valid headers to be adversely affected. In particular, it is necessary to identify the headers that fail to successfully decompress due to an unexpected error so that corruption of the context can be prevented. As part of the description of a header, PEBBLE can define the length of a checksum that should be provided over the original uncompressed header in order to detect the occurrence of unexpected errors and prevent them from mistakenly being used to update the context. This situation is illustrated in Figure 2: Price et al. [Page 12] INTERNET-DRAFT PEBBLE October 28, 2002 Checksum failure +--------------+ +--------------+ ================ +--------------+ -| Valid header |-| Valid header |-|Invalid header|-| Valid header |- +--------------+ +--------------+ ================ +--------------+ | | | | | | >------+ +------>------+ +--------------->--------------+ +------> context context Figure 2: Preventing accidental corruption of the context The process for creating the checksum is described in greater detail in Section 4.6. 4.5. Labels When describing the behavior of a protocol stack it is often the case that two fields in separate parts of the header have related behavior. For example, the Protocol field in an IP header is related to the type of header that follows. A standard version of BNF can capture this dependency by creating a new BNF rule to describe each possible case. As an example, suppose that a 1-bit flag field is followed by some intermediate fields that behave independently from the flag, and then by some fields whose behavior changes depending on whether the flag field takes value 0 or value 1: header ::= case_0 | case_1 case_0 ::= flag_is_0 intermediate_fields dependent_behavior_0 case_1 ::= flag_is_1 intermediate_fields dependent_behavior_1 The drawback of this technique is that a new BNF rule must be created to capture each dependent behavior, which can result in extremely complex BNF descriptions when the number of dependencies is high. An alternative technique for capturing the above field dependency would be to label the flag field with a unique name, and then reference this name when defining the behavior of the dependent fields. For example: header ::= flag_field ; store field value as "flag" intermediate_fields dependent_fields ; define as function of "flag" Price et al. [Page 13] INTERNET-DRAFT PEBBLE October 28, 2002 PEBBLE allows field values to be stored for later reference by means of a label. A label name can be any combination of letters, digits or the symbol "_", provided that it doesn't coincide with the name of a rule, integer operator, or any of the operand names used when defining new rules. Each label can contain an integer value or can be undefined. When a rule is invoked it is passed a list of the labels which are currently defined together with their integer values. This list is treated as an implicit operand for the rule: it can alter the behavior of the rule, but since the list of labels is always supplied it doesn't need to be included when the operands for the rule are declared. Each rule also returns a list of labels together with their integer values. The input and output lists may be different (in particular the output list may include integer values for labels that were undefined in the input list, or vice versa). Label names can be used in integer expressions, in which case they are replaced by the input label value for the rule using the integer expression as one of its operands. 4.6. Control fields A control field is a field whose value is created by an encoder and not taken from the uncompressed protocol header itself. New control fields are defined in PEBBLE by means of three well-known labels: choice, msn and checksum. If these labels occur in an integer expression but their values are undefined, the encoder chooses a value for the label as follows: The "choice" label is assigned a value chosen by the particular implementation of an encoder. Different implementations may choose different values for the control field, allowing a certain amount of implementation flexibility when programming an encoder that makes use of the PEBBLE notation. For example, an implementation that wishes to obtain the highest possible compression efficiency may choose to try several different values for the control field (seeking to maximize the overall compression ratio for the header flow), whereas an implementation that wishes to reduce its processing requirements may use a simpler mechanism for choosing the value of the control field. The "msn" label is assigned the Master Sequence Number (MSN). The MSN is a counter which returns the position of the header in the flow: if the kth header in the flow is currently being described then the msn label is assigned the value k-1. The "checksum" label is assigned a checksum over the header to prevent unexpected errors from corrupting the context. A sufficiently long checksum should be provided to ensure the probability that an unexpected error will not be spotted is negligible. The algorithm for Price et al. [Page 14] INTERNET-DRAFT PEBBLE October 28, 2002 calculating the checksum is chosen by the encoder and hence is beyond the scope of PEBBLE. Note that once the three labels have been assigned values they behave in exactly the same manner as other labels. In particular the encoder cannot change the value for the label until a rule sets the label to be undefined. 5. Improving readability The following chapter defines some extensions to the PEBBLE notation that are designed to offer more convenient alternatives for existing constructs. Whilst the extensions do not affect the expressive power of PEBBLE, they are considered useful for improving its overall readability. 5.1. Shorthand form of operators The standard form of the rules and integer operators can be cumbersome (particularly for simple operators), so it is possible to define operators using a special shorthand form. In the shorthand form of an operator the operands are given without parentheses or commas, but are instead separated by a well-known symbol such as "+", "-", "<", "|", etc. For example: CONCATENATION(foo,bar) can be rewritten as foo whitespace bar Using operators in shorthand form can lead to ambiguity because the operands are not necessarily enclosed in parentheses. For example the expression a+b*c could be interpreted as a+(b*c) or as (a+b)*c. To resolve this ambiguity it is necessary to define a precedence order for operators in shorthand form. The operator with the highest precedence takes the shortest possible expressions as its operands. The operator with the second-highest precedence takes the shortest possible expressions given that the first criterion holds, and so on. Since the operator "*" has a higher precedence than "+", the expression a+b*c must be interpreted as a+(b*c). Given two operators with the same precedence, the leftmost operator takes the shorter expressions as its operands. So 2^3^4 is interpreted as (2^3)^4. The operators that can be expressed in shorthand form are defined in Chapter 6 together with their precedence. 5.2. Lists of undefined size For notational convenience it is possible to specify a list of operands by appending an empty pair of square braces to the operand Price et al. [Page 15] INTERNET-DRAFT PEBBLE October 28, 2002 name. In this case the number of entries in the list is not specified. Instead, when the rule is encountered in an expression the number of entries in the list is derived by counting the total number of operands that follow the rule. For example, the operands for the CONCATENATION rule are declared as follows: CONCATENATION(rule[]) If the term CONCATENATION(foo,bar) is encountered in an expression then the list rule[] must have two entries, with rule[0] = foo and rule[1] = bar. If more than one list of operands is created with an unspecified size then the size of each list must be equal. For example: INFERRED_TRANSLATE(#n,#a[],#b[]) If the term INFERRED_TRANSLATE(1,2,3,4,5) is encountered in an expression then each of the lists a[] and b[] must contain two items, where a[0] = 2, a[1] = 3, b[0] = 4 and b[1] = 5. If INFERRED_TRANSLATE has an even number of operands then the rule must fail. If a new rule declares a list of undefined size then the well-known operand "end" is provided to access the length of the list in expressions. For example: PICK(rule[],#p[]) ::= CHOICE(field_value) LOCAL(field_value,rule[field_value]) DISTRIBUTION([p[j] for j in 0..end]) In order to avoid forward referencing, all lists of operands with undefined size must be declared after the single operands and the lists of operands with known size. So defining a rule INFERRED_TRANSLATE(#a[],#b[],#n) would be incorrect. 5.3. Local definitions To improve the readability of the notation, the definition of a new rule can be given in terms of one or more "local definitions". For example: UNCOMPRESSIBLE($label,#d,#m,#p) ::= IRREGULAR(field_length) where field_length is (floor(label,d)*m+p) The keyword "where" indicates that a list of local definitions follows the description of a new rule. Each local definition is given a name, which can be any combination of letters, digits or the symbol "_" provided that it doesn't coincide with the name of a rule, integer operator, label, or any of the operands or local definitions Price et al. [Page 16] INTERNET-DRAFT PEBBLE October 28, 2002 already specified for the rule. Note however that the names of local definitions from one rule can be reused when making local definitions for a different rule. The name of each local definition is followed by the keyword "is" and then by an arbitrary expression. The expression can include any of the operands for the new rule and any labels defined for the rule. Note that if the expression contains labels then its value may change depending on where it occurs in the description of the new rule. In particular, if a local definition appears several times when creating a new rule then its value should be recalculated each time it occurs. 6. Pre-defined rules and integer operators The following chapter defines a number of integer operators and rules that can be used when creating new PEBBLE rules as per Chapter 4. 6.1. Integer operators This section provides a set of integer operators for use when creating new rules. The following operators are currently available: Integers Integers can be expressed as decimal values, binary values prefixed by 0b, or hex values prefixed by 0x. Negative integers are prefixed by a "-" sign. Label names Label names can be used in integer expressions, where they are replaced by the input label value for the rule which called the integer expression. NULL The keyword "NULL" can be used to specify an undefined integer value. This is useful for deleting the value contained within a label. Parentheses Parentheses can be used to enclose expressions, which can be useful in conjunction with operators in shorthand form. Arithmetic The operators +, -, *, / and ^ can be used. Note that k/v is undefined if it does not evaluate to an integer. Percentages A value can be expressed as a percentage with up to 2 decimal places (i.e. abc.xy%). This is interpreted as an integer from 0 to 10000 inclusive (v% is interpreted as the integer v*100, so in particular 0% is interpreted as 0 and 100% is interpreted as 10000). Price et al. [Page 17] INTERNET-DRAFT PEBBLE October 28, 2002 Inequalities Inequalities (<, =<, ==, !=, >=, >) return 1 if they are true and 0 if they are false. floor(#k,#v) Returns the largest integer not larger than k/v (undefined when v = 0). mod(#k,#v) Returns k-v*floor(k,v). log2(#v) Returns the smallest integer k where v =< 2^k. min(#k[]) Returns j such that k[j] is minimal (taking the smallest value of j in the event of a tie). checksum16(#n) Returns the integer value of an IP checksum over the last n bits in the packet. switch(#j,#k[]) Returns k[j]. byte_swap(#n,#k) Returns the byte-swapped value of k when treated as an n-bit field (undefined if n is not a multiple of 8). permutation(#n,#k,#x) Checks if the integer x is a concatenation of the integers 0 to n-1 (each interpreted as a bit pattern of length k) in any order. Returns 1 if this is the case and 0 otherwise. context_length(#n) Returns the length of the nth field in the context. context_value(#n) Returns the value of the nth field in the context. Note that if any of the operands are undefined, the value returned by the operator is also undefined. The precedence of each operator in shorthand form is given below (higher precedence first): Integers, percentages ^ *, / +, - <, =<, ==, !=, >=, > 6.2. Pre-defined rules The following sections describe the four pre-defined rules provided by PEBBLE. Price et al. [Page 18] INTERNET-DRAFT PEBBLE October 28, 2002 6.2.1. FIELD The FIELD rule indicates the existence of a new field to be described by the PEBBLE notation. Note that the actual behaviour of the field is defined separately (for example by using the DISTRIBUTION rule). The syntax of the FIELD rule is given below: FIELD(#n) The operand n specifies the number of bits in the field. The FIELD rule uses the well-known label "remaining_data" to determine the position of the field in the packet. The remaining_data label specifies the start of the field as an bit offset relative to the end of the packet (so if remaining_data is 16 then the rule FIELD(16) refers to the last 16 bits in the packet), as illustrated in Figure 3. The integer value of the field can then be determined, taking MSBs to precede LSBs. If n or remaining_data are undefined, or if remaining_data < n, then the FIELD rule must fail. <-------------- remaining_data ---------------> +---------------------+-------------+-------------------------------+ | | n-bit field | | +---------------------+-------------+-------------------------------+ <----------------------- header and payload ------------------------> Figure 3: The location of the n-bit field described by FIELD(n) If n = 0 then the integer value of the field is always taken to be 0. If the FIELD rule successfully locates a field, in the list of labels output by the FIELD rule the value of remaining_data is set to n less than its input value. Also, if the well-known label "field_value" is undefined in the input labels then for the output labels it is set equal to the integer value of the field. If field_value already has an integer value however, the value is left unchanged in the output list of labels and the value of remaining_data is not modified. The FIELD rule may also provide an updated value for one context entry: the location of the context entry to be updated is given by the well-known label "index". The FIELD rule first checks the well- known label "no_update" to determine whether an updated context value must be provided. If no_update = 1 then the values for updated_length(index) and updated_value(index) are set equal to context_length(index) and context_value(index) respectively. Otherwise they are set equal to the length and value of the field. The output value for index is set to 1 higher than its input value. All other labels are left unchanged by the FIELD rule, and the probability value output by the FIELD rule is always 100%. Price et al. [Page 19] INTERNET-DRAFT PEBBLE October 28, 2002 6.2.2. DISTRIBUTION The DISTRIBUTION rule gives the probability distribution of the well- known label "field_value". The syntax of the DISTRIBUTION rule is given below: DISTRIBUTION(#p[]) The rule has a list of k operands p[0],...,p[k-1] specifying the probability distribution of the label. If probability[i] is the chance that the label field_value will take value i then: probability[i] = p[i] for i from 0 to k-1 -------------------- p[0]+p[1]+...+p[k-1] The probability that the label will take a value outside the range 0 to k-1 is zero. The DISTRIBUTION rule is successful provided that all of the k operands are positive integers and that the input value for the well- known label field_value lies between 0 and k-1 inclusive. If this is the case then the value of the label field_value and the probability distribution can be passed to an encoder that requires this information. The output value for the label field_value is undefined; all other labels are left unchanged by the rule. The DISTRIBUTION rule does not make use of the context and does not define any updated context entries. 6.2.3. MAP The MAP rule updates the value contained in a label. The syntax of the rule is given below: $label = #old_value ? #new_value Note that the syntax is in shorthand form, using the separators "=" and "?". The MAP rule checks whether the input value for the specified label is equal to old_value. If this is the case then the output value for the label is set equal to new_value. If it is not the case then the MAP rule fails. All other labels are left unchanged by the rule. The expressions for old_value and new_value can contain any label names except for the label that is being updated by the MAP rule. The MAP rule does not define any updated context entries, and the probability value output by the MAP rule is always 100%. Price et al. [Page 20] INTERNET-DRAFT PEBBLE October 28, 2002 6.2.4. CONCATENATION The CONCATENATION rule matches a header which is the concatenation of the headers matched by n existing rules. The syntax of the CONCATENATION rule is given below: CONCATENATION(rule[]) The CONCATENATION rule also has the following shorthand form: rule[0] whitespace ... whitespace rule[n-1] The shorthand version of the CONCATENATION rule has the lowest possible precedence (lower than any integer operators). The operands for the CONCATENATION rule consist of n existing rules denoted rule[0],...,rule[n-1]. The CONCATENATION rule applies each of the existing rules to the input packet, and is considered to match only if all of the n existing rules match. When the CONCATENATION rule is invoked it is passed an input list of labels and their corresponding integer values. This list is passed unmodified to rule[0], which returns a (possibly updated) list of labels to the CONCATENATION rule. The output list from rule[i-1] is then used as the input list for rule[i] for all i from 1 to n-1 inclusive. Finally the output list from rule[n-1] is used unmodified as the output list for the CONCATENATION rule. This situation is illustrated in Figure 4: | ^ Labels | | Labels v | +-------------------------------------+ | CONCATENATION(rule_1,rule_2,rule_3) | +-------------------------------------+ | +-------------+ +-------------+ ^ Labels | | | | | | Labels v | v | v | +--------+ +--------+ +--------+ | rule_1 | | rule_2 | | rule_3 | +--------+ +--------+ +--------+ Figure 4: Passing labels between rules Note that this avoids the need for an implementation to store more than one copy of each label. Any updated context entries defined by the n existing rules are passed unmodified as the output of the CONCATENATION rule (unless two existing rules define contradictory values for the same context entry, in which case the CONCATENATION rule fails). Note that the Price et al. [Page 21] INTERNET-DRAFT PEBBLE October 28, 2002 updated entries for the context do not overwrite the original context entries while the current header is being parsed: instead they are buffered and only used to replace the original context entries for the next header in the flow. The probability value output by the CONCATENATION rule is the product of the probabilities output by each of the n existing rules (i.e. the probabilities are assumed to be independent). 7. Library of commonly used rules The ROHC [RFC-3095] standard contains a number of different techniques for compressing header fields (LSB encoding, scaled timestamp encoding, list-based compression etc.). Each of these techniques can be added to a library of rules describing several common field behaviors. The rules contained in the library can be concatenated to give a description of a complete protocol stack. For example, a rule describing the behavior of an IPv6 header can be constructed from the rules available in the library as shown below: IPv6_Header ::= Version Traffic_Class ECT_Flag CE_Flag Flow_Label Payload_Length Next_Header Hop_Limit Source_Address Destination_Address Version ::= VALUE(4,6) Traffic_Class ::= STATIC ECT_Flag ::= STATIC CE_Flag ::= VALUE(1,0) 99% | VALUE(1,1) 1% Flow_Label ::= STATIC Payload_Length ::= INFERRED_SIZE(16,288) Next_Header ::= LABEL(8,next_header) Hop_Limit ::= STATIC Source_Address ::= STATIC Destination_Address ::= STATIC Price et al. [Page 22] INTERNET-DRAFT PEBBLE October 28, 2002 The rules in the library are specified in terms of the four pre- defined rules, so new rules can be added to the library as and when they are needed. 7.1. Numeric rules This section defines the simplest set of rules that are typically used to handle numeric fields. 7.1.1. RANGE The RANGE rule describes an n-bit field that takes a known range of values. The definition of the rule is given below: RANGE(#n,#v,#k) ::= FIELD(n) tmp = NULL ? field_value field_value = tmp ? mod(tmp,k) tmp = mod(mod(field_value-v,k)+v,2^n) ? NULL DISTRIBUTION([1 for j in 0..k]) The RANGE rule matches whenever the next n bits in the header (i.e. the n bits following the well-known label "remaining_data") lie between v and v+k-1 inclusive. The RANGE rule also assigns a probability distribution to the field, specifying that each of the k possible field values occurs with equal probability. The remaining rules in this section are all defined in terms of the RANGE rule. 7.1.2. VALUE The VALUE rule describes an n-bit field that takes a known value v. The definition of the rule is given below: VALUE(#n,#v) ::= RANGE(n,v,1) 7.1.3. IRREGULAR The IRREGULAR rule describes an n-bit field that behaves randomly (i.e. each of the 2^n possible bit patterns occurs with equal probability). The definition of the rule is given below: IRREGULAR(#n) ::= RANGE(n,0,2^n) 7.1.4. IRREGULAR_PADDED The IRREGULAR_PADDED rule describes an n-bit field with k least significant bits that behave randomly and with n-k most significant bits that are always zero. The definition of the rule is given below: Price et al. [Page 23] INTERNET-DRAFT PEBBLE October 28, 2002 IRREGULAR_PADDED(#n,#k) ::= RANGE(n,0,2^k) 7.1.5. STATIC The STATIC rule describes a field with the same length and value as the corresponding field stored in the context. The definition of the rule is given below: STATIC ::= RANGE(context_length(index),context_value(index),1) Note that the rule is defined in terms of the integer operators context_length(#n) and context_value(#n), which respectively return the length and value of the nth field stored in the context. 7.1.6. LSB The LSB rule describes a field whose value differs by a small amount from the field value stored in the context. The definition of the rule is given below: LSB(#k,#p) ::= RANGE(context_length(index),context_value(index)- p,2^k) The LSB rule can be applied when the field value lies between (context_value-p) and (context_value-p+2^k-1) inclusive. In particular, if p = 0 then the field value can only stay the same or increase relative to the previous header in the flow. If p = -1 then it can only increase, whereas if p = 2^k then it can only decrease. 7.2. Labeling rules This section lists some general rules that can be used to define new labels in terms of existing labels. 7.2.1. LABEL The LABEL rule assigns a label name to an n-bit field. The definition of the rule is given below: LABEL(#n,$label) ::= FIELD(n) label = NULL ? field_value field_value = label ? NULL The LABEL rule can be used in conjunction with the NEXT_FIELD rule specified below to change the order in which fields are defined in a PEBBLE description. This is particularly important for capturing dependencies between fields that occur in different parts of the header. Price et al. [Page 24] INTERNET-DRAFT PEBBLE October 28, 2002 7.2.2. NEXT_FIELD The NEXT_FIELD rule indicates that the next field to be defined is not the next field in the header, but instead is a value contained in a label. The definition of the rule is given below: NEXT_FIELD($label) ::= field_value = NULL ? label label = field_value ? NULL The NEXT_FIELD rule copies the value contained in the named label to the well-known label field_value. When a FIELD rule is next encountered it will use this value rather than taking a field value from the header. 7.2.3. LOCAL The LOCAL rule temporarily deletes the value of a label. The rule takes as its operands the name of a label and an existing rule: the named label is set to "undefined" and then the existing rule is invoked. The original label value is restored in the output of the LOCAL rule. The definition of the rule is given below: LOCAL($label,rule) ::= CALL(label,label,rule) Note that the definition of the LOCAL rule is given in terms of another rule CALL which is defined as follows: CALL(#k,$label,rule) ::= label = k ? NULL rule label = NULL ? k The LOCAL rule can be useful when defining a new rule in terms of an existing rule, both of which use the same name for a label. It prevents the case whereby two values are assigned to the same label at the same time (which results in the rule failing to match any header). 7.3. INFERRED rules The INFERRED rules describe fields whose values can be inferred from other fields in the header. The following INFERRED rules are available: 7.3.1. INFERRED_TRANSLATE The INFERRED_TRANSLATE rule translates a field value under a certain mapping. The definition of the rule is given below: Price et al. [Page 25] INTERNET-DRAFT PEBBLE October 28, 2002 INFERRED_TRANSLATE(#n,#a[],#b[]) ::= FIELD(n) tmp = NULL ? field_value field_value = tmp ? b[min([tmp!=a[j] for j in 0..end])] tmp = a[min([field_value!=b[j] for j in 0..end])] ? NULL The first operand specifies the length of the field before the translation. This is followed by additional lists of integers, respectively giving the value of the field before and after it is translated. For example: GRE_protocol ::= INFERRED_TRANSLATE(16,0x86dd,0x0800,41,4) The GRE Protocol field behaves in the same manner as the Next Header field in other extension headers, except that it indicates that the subsequent header is IPv6 or IPv4 using the values 0x86dd and 0x0800 instead of 41 and 4. The INFERRED_TRANSLATE rule can translate the values required by the GRE Protocol field into the standard values, whose behavior can then be described by another rule such as LIST. 7.3.2. INFERRED_SIZE The INFERRED_SIZE rule infers the value of a field from the total amount of remaining data. The definition of the rule is given below: INFERRED_SIZE(#n,#p) ::= VALUE(n,(remaining_data-n-p)/8) The first operand specifies the length of the field in bits, and the second operand specifies an offset that will be subtracted from the total data length when deriving the value of the INFERRED_SIZE field. 7.3.3. INFERRED_OFFSET The INFERRED_OFFSET rule describes a field that takes a known offset relative to a certain base value. The definition of the rule is given below: INFERRED_OFFSET(#n,$base) ::= FIELD(n) tmp = NULL ? field_value field_value = tmp ? mod(field_value-base,2^n) tmp = mod(field_value+base,2^n) ? NULL The first operand defines the length of the field. The second operand gives the name of a label where the base value can be found. The value of this label is defined by another rule (e.g. the LABEL rule). The INFERRED_OFFSET rule subtracts the base value from the field value. The resulting offset is then placed in the well-known label field_value so that it can be described by the next rule. For example, a typical sequence number can be described as follows: Price et al. [Page 26] INTERNET-DRAFT PEBBLE October 28, 2002 sequence_number ::= INFERRED_OFFSET(32,msn) STATIC 99% | LSB(8,-1) 0.7% | LSB(16,-1) 0.2% | IRREGULAR(32) 0.1% In this case the offset value is expected to change rarely and only by small amounts, and hence it is defined using the STATIC and LSB rules. 7.3.4. INFERRED_SEQUENCE The INFERRED_SEQUENCE rule describes a field that takes a known offset relative to a base value, in the same way as INFERRED_OFFSET. However, INFERRED_SEQUENCE additionally defines a new base value equal to the value of the field. This is useful for describing sequences of incrementing values. The definition of the rule is given below: INFERRED_SEQUENCE(#n,$base) ::= FIELD(n) tmp = NULL ? field_value field_value = tmp ? mod(tmp-base,2^n) base = mod(tmp-field_value,2^n) ? tmp tmp = base ? NULL The first operand defines the length of the field. The second operand gives the name of a label where the base value can be found. The value of this label is defined by another rule (e.g. the LABEL rule). The INFERRED_SEQUENCE rule subtracts the base value from the field value. The resulting offset is then placed in the well-known label field_value so that it can be described by the next rule. Additionally, the base label is updated to contain the original value of the field being described. For example, a sequence of values such as 10, 20, 30 can be described as follows: incrementing_sequence ::= LABEL(16,base_label) INFERRED_SEQUENCE(16,base_label) STATIC 90% | IRREGULAR(16) 10% ; describes (2nd - 1st) entry INFERRED_SEQUENCE(16,base_label) STATIC 90% | IRREGULAR(16) 10% ; describes (3rd - 2nd) entry NEXT_FIELD(base_label) STATIC 90% | IRREGULAR(16) 10% ; describes 3rd entry Price et al. [Page 27] INTERNET-DRAFT PEBBLE October 28, 2002 7.3.5. INFERRED_IP_CHECKSUM The INFERRED_IP_CHECKSUM rule is used to describe the IP checksum field. The definition of the rule is given below: INFERRED_IP_CHECKSUM ::= tmp = NULL ? remaining_data remaining_data = tmp ? tmp+80 FIELD(16) remaining_data = tmp+96 ? tmp field_value = checksum16(tmp) ? NULL tmp = remaining_data ? NULL The rule matches whenever the 2-byte field at an offset of 10 bytes from the current position in the packet contains a 16-bit one's complement checksum over the remaining data. 7.4. Control field rules This section provides a selection of rules for defining new control fields. 7.4.1. CHOICE The CHOICE rule creates a label containing an implementation-specific value. The definition of the rule is given below: CHOICE($label) ::= label = NULL ? choice choice = label ? NULL 7.4.2. PICK The PICK rule allows an implementation to pick one rule from a list of existing rules. The definition of the rule is given below: PICK(rule[],#p[]) ::= CHOICE(field_value) LOCAL(field_value,rule[field_value]) DISTRIBUTION([p[j] for j in 0..end]) The operands for the PICK rule include a list of existing rules, followed by a list of integers specifying the relative probability with which each rule will be applied in a typical header flow. The PICK rule also has the following shorthand form using the symbol "|": rule[0] whitespace p[0] | ... | rule[n-1] whitespace p[n-1] The precedence for this shorthand form is higher than that for the CONCATENATION rule, but lower than the precedence of any integer operators. Price et al. [Page 28] INTERNET-DRAFT PEBBLE October 28, 2002 7.4.3. FORMAT The FORMAT rule describes a field which displays the same behavior as the corresponding field in the previous header. The definition of the rule is given below: FORMAT(rule[]) ::= CHOICE(field_value) LOCAL(field_value,rule[field_value]) STATIC The FORMAT rule is followed by a list of existing rules, each describing a different behavior for the field. The rule that is chosen is the same rule as used for the preceding header. For example: IP_ID ::= FORMAT(SEQUENTIAL_IP_ID,RANDOM_IP_ID) Two behaviors for the IP-ID are provided: one for an IP-ID that increases sequentially, and one for a randomly behaving IP-ID. 7.4.4. NBO The NBO rule describes a field that may be in reverse byte order from the rest of the data, as is sometimes seen with the IP-ID. The definition of the rule is given below: NBO(#n) ::= LABEL(n,base_field) CHOICE(nbo_flag) nbo_field = NULL ? switch(nbo_flag, byte_swap(n,base_field),base_field) base_field = switch(nbo_flag, byte_swap(n,nbo_field),nbo_field) ? NULL The rule examines the n-bit field (where n is typically 16 or 32) and generates a label nbo_flag indicating whether or not the field is in network byte order. Determining whether a field is in network byte order is a complex task (typically requiring the encoder to examine a history of field values) so the value of the nbo_flag label is left as a choice for the implementation. If the field is in network_byte_order then the nbo_flag is set to 1 and a label nbo_field is created containing the original field value. If the field is not in network byte order then the nbo_flag is set to 0 and the label nbo_field is given the byte-swapped value of the original field. For example: IP_ID ::= NBO(16) ; Checks byte order NEXT_FIELD(nbo_flag) ; Describes nbo_flag STATIC 99% | IRREGULAR(1) 1% Price et al. [Page 29] INTERNET-DRAFT PEBBLE October 28, 2002 NEXT_FIELD(nbo_field) ; Describes nbo_field INFERRED_OFFSET(16,msn) STATIC 80% | LSB(5,-1) 15% | IRREGULAR(16) 5% 7.4.5. SCALE The SCALE rule describes a field which changes by a regular (usually large) amount in consecutive headers. The definition of the rule is given below: SCALE(#n) ::= LABEL(n,base_field) CHOICE(scale) scaled_field = NULL ? floor(base_field,scale) remainder_field = NULL ? mod(base_field,scale) base_field = scale*scaled_field+remainder_field ? NULL The number of bits specified are examined and a suitable scale factor is chosen (typically using a history of field values). The original field value is then mapped to two values: the scaled value obtained by dividing the field by the chosen scale factor, and the remainder value from this division. These two values plus the scale factor itself are sufficient to reconstruct the original field value. 7.4.6. CHECKSUM The CHECKSUM rule describes a checksum calculated across the header. The size of the checksum can be altered depending on the characteristics of the link over which the protocol is to be transmitted. The definition of the rule is given below: CHECKSUM(#n) ::= NEXT_FIELD(checksum) IRREGULAR(n) Note that it is possible to offer different checksum lengths, so extra checksum bits can be allocated to protect important headers. This is illustrated in the example below: checksum_field ::= CHECKSUM(3) 99% | CHECKSUM(7) 1% 7.5. List rules The behavior of many protocol fields and headers can be described by a single rule built up from lists of simpler rules. This section defines a set of rules that can be used to describe the behavior of these lists. Price et al. [Page 30] INTERNET-DRAFT PEBBLE October 28, 2002 7.5.1. LIST The LIST rule describes a list of items that do not necessarily occur in the same order for every header. Example applications for the LIST rule include TCP options and TCP SACK blocks. The definition of the rule is given below: LIST($label,#d,#m,#p,rule[]) ::= start = NULL ? remaining_data order = NULL ? 0 presence = NULL ? 0 CONCATENATION([LIST_ITEM for j in 0..end]) start = switch(permutation(end,k,order), -1,remaining_data+floor(label,d)*m+p) ? NULL where k is log2(end) LIST_ITEM is CHOICE(next_rule) CHOICE(optional) rule[next_rule] tmp = NULL ? order order = tmp ? tmp*2^k+next_rule next_rule = mod(order,2^k) ? NULL tmp = floor(order,2^k) ? NULL tmp = NULL ? presence presence = tmp ? tmp*2+optional optional = mod(presence,2) ? NULL tmp = floor(presence,2) ? NULL The first operand gives the name of a label, whilst the next three operands specify a divisor, multiplier and offset which scale the value of the label so that it specifies the exact size of the list in bits. These operands are followed by a set of n existing rules that can be used to describe individual items in the list. Once the total list size is reached, the LIST rule outputs well-known labels "order" and "presence". The order consists of a string of n * k-bit entries (where k is the least number of bits required to index each of the n entries). The order list contains the indices in the order in which they occurred. The presence data is a list a 1-bit flags, one per entry in the list, which are set to "1" to indicate that the list item is present or "0" if not. These flags occur in the order in which the entries appear in the LIST rule. It is expected that the OPTIONAL rule (see Section 7.6.1) might be used within LIST. The well-known label "optional" is thus set Price et al. [Page 31] INTERNET-DRAFT PEBBLE October 28, 2002 automatically within the LIST rule, so that it can be used in conjunction with the OPTIONAL rule as illustrated below: TCP_Options ::= LIST(tcp_data_offset,1,32,0, OPTIONAL(TCP_SACK), OPTIONAL(TCP_TIMESTAMP), OPTIONAL(TCP_END), OPTIONAL(TCP_GENERIC)) STATIC 99.9% | ; TCP Options order IRREGULAR(8) 0.1% STATIC 75% | ; TCP Options presence IRREGULAR(4) 25% The LIST rule implicitly describes the behavior of the label that is specified in the first operand, so there is no need to explicitly describe this label using a NEXT_FIELD rule. 7.5.2. LIST_N The LIST_N rule is similar to the LIST rule, with the addition that each of the rules comprising the list can be applied multiple times. The definition of the rule is given below: LIST_N($label,#d,#m,#p,rule[],#number[]) ::= LIST(label,d,m,p,[rule[j] for j in 0..end, k in 0..number[j]]) LIST_N is a useful shorthand in the case that a rule in the list may occur a large number of times. For parsing purposes, a count of number[k] is equivalent to having written out rule[k] a total of number[k] times. This form of specification is chosen over a truly open list (i.e. one allowing an unbounded number of repeats) since it is clear that it should always be possible to set an upper bound, and moreover this information may be useful to certain encoders. For example: TCP_Options ::= LIST_N(tcp_data_offset,1,32,0, OPTIONAL(TCP_SACK), OPTIONAL(TCP_TIMESTAMP), OPTIONAL(TCP_END), OPTIONAL(TCP_PAD), OPTIONAL(TCP_GENERIC), 1,1,1,32,3) 7.6. Miscellaneous rules This section introduces some miscellaneous rules that can be used to describe fields in a protocol header. Price et al. [Page 32] INTERNET-DRAFT PEBBLE October 28, 2002 7.6.1. OPTIONAL The OPTIONAL rule is used to describe fields that are optionally present in the header. The definition of the rule is given below: OPTIONAL(rule) ::= CONCATENATION([rule for j in 0..optional-1]) The OPTIONAL rule uses the well-known label "optional" to specify whether or not the optional field is present in the header. The value of the label can be set by another rule such as LABEL or LIST. For example: GRE_Key_Flag ::= LABEL(1,optional) GRE_Key ::= OPTIONAL(Key_Present) In this case the Key_Present rule is called to describe the GRE_Key field, but only if the GRE_Key Flag field is set to 1. 7.6.2. UNCOMPRESSIBLE The UNCOMPRESSIBLE rule is a special case of the IRREGULAR rule, where the length of the field can be derived from the value contained in a label. The definition of the rule is given below: UNCOMPRESSIBLE($label,#d,#m,#p) ::= IRREGULAR(floor(label,d)*m+p) The first operand gives the name of the label, whilst the next three operands specify a divisor, multiplier and offset which scale the value of the label so that it specifies the exact size of the UNCOMPRESSIBLE field in bits. The UNCOMPRESSIBLE rule is typically used in conjunction with the LABEL rule. 7.6.3. SKIP The SKIP rule skips past a certain field in the protocol header without defining its behavior. The definition of the rule is given below: SKIP(#n) ::= tmp = NULL ? remaining_data remaining_data = tmp ? tmp+n tmp = remaining_data-n ? NULL 7.6.4. NO_UPDATE The NO_UPDATE rule behaves as the rule specified by its operand, with the exception that it does not update the context. This is useful when a field exhibits an unusual behavior for one header and then reverts back to its original behavior in subsequent headers. Price et al. [Page 33] INTERNET-DRAFT PEBBLE October 28, 2002 The definition of the rule is given below: NO_UPDATE(rule) ::= no_update = NULL ? 1 rule no_update = 1 ? NULL An example of the NO_UPDATE rule in use is given below: TCP_Window ::= STATIC 99% | NO_UPDATE(LSB(11,2048)) 0.9% | IRREGULAR(16) 0.1% In the above example the NO_UPDATE rule is applied to the TCP Window field. The field can behave as defined by the LSB rule, but when this behavior occurs it does not update the context. 8. Security considerations This draft describes an abstract notation similar to ABNF [RFC-2234], and hence is not believed to raise any security issues. 9. Acknowledgements A number of important concepts and ideas have been borrowed from ROHC [RFC-3095]. Updates to the LIST rules owe much to discussions with Qian Zhang and Hongbin Liao. Thanks to Paul Ollis for field labeling; and to Rob Hancock and Stephen McCann for putting up with the authors' arguments and making helpful suggestions, frequently against the tide! The authors would also like to thank Carsten Bormann, Christian Schmidt, Max Riegel and Lars-Erik Jonsson for their comments and encouragement. We haven't always agreed, but the arguments have been fun! 10. Authors' addresses Richard Price Tel: +44 1794 833681 Email: richard.price@roke.co.uk Abigail Surtees Tel: +44 1794 833131 Email: abigail.surtees@roke.co.uk Mark A West Tel: +44 1794 833311 Email: mark.a.west@roke.co.uk Price et al. [Page 34] INTERNET-DRAFT PEBBLE October 28, 2002 Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom http://www.roke.co.uk 11. References [RFC-2026] "The Internet Standards Process - Revision 3", Scott Bradner, RFC 2026, Internet Engineering Task Force, October 1996 [RFC-2119] "Key words for use in RFCs to Indicate Requirement Levels", Scott Bradner, RFC 2119, Internet Engineering Task Force, March 1997 [RFC-2234] "Augmented BNF for Syntax Specifications: ABNF", D. Crocker and P. Overell, RFC 2234, Internet Engineering Task Force, November 1997 [RFC-2960] "Stream Control Transmission Protocol", Stewart et al, RFC 2960, Internet Engineering Task Force, October 2000 [RFC-3095] "RObust Header Compression (ROHC)", Carsten Bormann et al, RFC3095, Internet Engineering Task Force, July 2001 [RFC-3261] "SIP: Session Initiation Protocol", J. Rosenberg et al, RFC 3261, Internet Engineering Task Force, June 2002 Appendix A. ABNF description of PEBBLE definition ::= new_rule ws "::=" ws expression [ws "where" 1*(ws local_definition)] new_rule ::= name [ws] ["(" operand *("," [ws] operand) ")"] name ::= *(letter | digit | "_") letter ::= %x41-5a | %x61-7a digit ::= %x30-39 operand ::= [type] name ["[" [name] "]"] type ::= "#" | "$" expression ::= operator | reference | repetition operator ::= longhand_form | shorthand_form Price et al. [Page 35] INTERNET-DRAFT PEBBLE October 28, 2002 longhand_form ::= name [ws] ["(" expression *("," [ws] expression) ")"] shorthand_form ::= [expression] *(reserved expression) [reserved] reserved ::= 1*(%x00-ff) reference ::= name ["[" expression "]"] repetition ::= "[" expression ws "for" ws index *("," [ws] index "]" index ::= name ws "in" ws expression ".." expression local_definition ::= name ws "is" ws expression Price et al. [Page 36]