Internet Draft                                                         
 Document: <draft-ietf-psamp-sample-tech-09.txt>               T. Zseby 
 Expires: November 2007                                Fraunhofer FOKUS 
                                                              M. Molina 
                                                                  DANTE 
                                                            N. Duffield 
                                                     AT&T Labs-Research 
                                                           S. Niccolini 
                                                        NEC Europe Ltd. 
                                                             F. Raspall 
                                                               EPSC-UPC 
                                                                       
                                                               May 2007 
                                                                       
  
    Sampling and Filtering Techniques for IP Packet Selection 
  
 Status of this Memo 
        
    By submitting this Internet-Draft, each author represents that any 
    applicable patent or other IPR claims of which he or she is aware 
    have been or will be disclosed, and any of which he or she becomes 
    aware will be disclosed, in accordance with Section 6 of BCP 79. 
     
    Internet-Drafts are working documents of the Internet Engineering 
    Task Force (IETF), its areas, and its working groups.  Note that 
    other groups may also distribute working documents as Internet-
    Drafts. 
        
    Internet-Drafts are draft documents valid for a maximum of six 
    months and may be updated, replaced, or obsoleted by other 
    documents at any time.  It is inappropriate to use Internet-Drafts 
    as reference material or to cite them other than as "work in 
    progress." 
     
    The list of current Internet-Drafts can be accessed at 
    http://www.ietf.org/ietf/1id-abstracts.txt. 
        
    The list of Internet-Draft Shadow Directories can be accessed at 
    http://www.ietf.org/shadow.html. 
        
    This Internet-Draft will expire on November, 2007. 
        
 Copyright Notice 
        
    Copyright (C) The IETF Trust (2007). 
        
     
 Abstract 
     
    This document describes Sampling and Filtering techniques for IP 
    packet selection. It provides a categorization of schemes and 
    defines what parameters are needed to describe the most common 
    selection schemes. Furthermore it shows how techniques can be 
    combined to build more elaborate packet Selectors. The document 


 Zseby, Molina, Duffield, Niccolini, Raspall               [Page 1] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    provides the basis for the definition of information models for 
    configuring selection techniques in Metering Processes and for 
    reporting the technique in use to a Collector. 
  
 Conventions used in this document 
   
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 
    this document are to be interpreted as described in RFC 2119 
    [RFC2119]. 
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 2] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


 Table of Contents 
  
    1.   Introduction................................................4 
    2.   PSAMP Documents Overview....................................4 
    3.   Terminology.................................................5 
    3.1     Observation Points, Packet Streams and Packet Content....5 
    3.2     Selection Process........................................5 
    3.3     Reporting................................................7 
    3.4     Metering Process.........................................7 
    3.5     Exporting Process........................................7 
    3.6     PSAMP Device.............................................8 
    3.7     Collector................................................8 
    3.8     Selection Methods........................................8 
    4.   Categorization of Packet Selection Techniques..............10 
    5.   Sampling...................................................11 
    5.1     Systematic Sampling.....................................12 
    5.2     Random Sampling.........................................13 
    5.2.1   n-out-of-N Sampling.....................................13 
    5.2.2   Probabilistic Sampling..................................13 
    5.2.2.1 Uniform Probabilistic Sampling..........................13 
    5.2.2.2 Non-Uniform Probabilistic Sampling......................14 
    5.2.2.3 Non-Uniform Flow State Dependent Sampling...............14 
    5.2.2.4 Configuration of non-uniform probabilistic and flow-state 
             Sampling...............................................14 
    6.   Filtering..................................................15 
    6.1     Property Match Filtering................................15 
    6.2     Hash-based Filtering....................................16 
    6.2.1   Application Examples for Hash-based Selection...........17 
    6.2.1.1 Approximation of Random Sampling........................17 
    6.2.1.2 Trajectory Sampling and Consistent Packet Selection.....18 
    6.2.2   Security Considerations for Hash Functions..............18 
    6.2.2.1 Vulnerabilities of Hash-based selection without knowledge 
             of selection outcomes..................................19 
    6.2.2.2 Vulnerabilities of Hash-based selection using knowledge of 
             selection outcomes.....................................20 
    6.2.2.3 Vulnerabilities to Replay Attacks.......................21 
    6.2.3   Choice of Hash-Function.................................21 
    6.2.3.1 Properties of some hash functions.......................21 
    6.2.3.2 Hash Functions for Packet Selection.....................22 
    6.2.3.3 Hash Functions Suitable for Packet Digesting............23 
    7.   Parameters for the Description of Selection Techniques.....23 
    7.1     Description of Sampling Techniques......................24 
    7.2     Description of Filtering Techniques.....................24 
    8.   Composite Techniques.......................................26 
    8.1     Cascaded Filtering->Sampling or Sampling->Filtering.....26 
    8.2     Stratified Sampling.....................................27 
    9.   Security Considerations....................................27 
    10.  Acknowledgements...........................................28 
    11.  IANA Considerations........................................28 
    12.  Normative References.......................................28 
    13.  Informative References.....................................28 
    14.  Authors' Addresses.........................................30 
    15.  Intellectual Property Statement............................31 
    16.  Copyright Statement........................................32 
    17.  Appendix: Hash Functions...................................32 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 3] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    17.1    IP Shift-XOR (IPSX) Hash Function.......................32 
    17.2    BOB Hash Function.......................................33 
  
 1. Introduction 
     
    There are two main drivers for the growth in measurement 
    infrastructures and their underlying technology. First, network 
    data rates are increasing, with a concomitant growth in measurement 
    data. Secondly, the growth is compounded by the demand by 
    measurement-based applications for increasingly fine grained 
    traffic measurements. Devices such as routers, which perform the 
    measurements, require increasingly sophisticated and resource 
    intensive measurement capabilities, including the capture of packet 
    headers or even parts of the payload, and classification for flow 
    analysis. All these factors can lead to an overwhelming amount of 
    measurement data, resulting in high demands on resources for 
    measurement, storage, transport and post processing. 
     
    The sustained capture of network traffic at line rate can be 
    performed by specialized measurement hardware. However, the cost of 
    the hardware and the measurement infrastructure required to 
    accommodate the measurements preclude this as a ubiquitous 
    approach. Instead some form of data reduction at the point of 
    measurement is necessary. This can be achieved by an intelligent 
    packet selection through Sampling, Filtering, or aggregation. The 
    motivation for Sampling is to select a representative subset of 
    packets that allow accurate estimates of properties of the 
    unsampled traffic to be formed. The motivation for Filtering is to 
    remove all packets that are not of interest. Aggregation combines 
    data and allows compact pre-defined views of the traffic. Examples 
    of applications that benefit from packet selection are given in 
    [PSAMP-FW]. Aggregation techniques are out of scope of this 
    document. 
     
 2. PSAMP Documents Overview 
     
    [PSAMP-FW]:   "A Framework for Packet Selection and Reporting" 
                   describes the PSAMP framework for network elements 
                   to select subsets of packets by statistical and 
                   other methods, and to export a stream of reports on 
                   the selected packets to a Collector. 
     
    [PSAMP-TECH]: "Sampling and Filtering Techniques for IP Packet 
                   Selection" (this document) describes the set of 
                   packet selection techniques supported by PSAMP. 
     
    [PSAMP-MIB]:  "Definitions of Managed Objects for Packet Sampling" 
                   describes the PSAMP Management Information Base. 
     
    [PSAMP-PROTO]: "Packet Sampling (PSAMP) Protocol Specifications" 
                   specifies the export of packet information from a 
                   PSAMP Exporting Process to a PSAMP Colleting 
                   Process. 
     
    [PSAMP-INFO]: "Information Model for Packet Sampling Exports" 
                   defines an information and data model for PSAMP. 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 4] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


 3. Terminology 
     
    The PSAMP terminology defined here is fully consistent with all 
    terms listed in [PSAMP-FW] but includes additional terms required 
    for the description of packet selection methods. An architecture 
    overview and possible configurations of PSAMP elements can be found 
    in [PSAMP-FW]. PSAMP terminology also aims at consistency with 
    terms used in [RFC3917]. The relationship between PSAMP and IPFIX 
    terms is described in [PSAMP-FW]. 
  
 3.1 Observation Points, Packet Streams and Packet Content  
     
    * Observation Point 
     
       An Observation Point is a location in the network where packets 
       can be observed. Examples include: 
        
         (i)  a line to which a probe is attached; 
          
         (ii) a shared medium, such as an Ethernet-based LAN; 
          
         (iii) a single port of a router, or set of interfaces 
               (physical or logical) of a router; 
          
         (iv) an embedded measurement subsystem within an interface. 
          
       Note that one Observation Point may be a superset of several 
       other Observation Points.  For example one Observation Point can 
       be an entire line card.  This would be the superset of the 
       individual Observation Points at the line card's interfaces. 
     
    * Observed Packet Stream 
     
       The Observed Packet Stream is the set of all packets observed at 
       the Observation Point. 
  
    * Packet Stream 
     
       A packet stream denotes a set of packets that flows past some 
       specified point within the metering process. An example of a 
       Packet Stream is the output of the selection process. 
       Note that packets selected from a stream, e.g. by Sampling, do 
       not necessarily possess a property by which they can be 
       distinguished from packets that have not been selected. For this 
       reason the term "stream" is favored over "flow", which is 
       defined as set of packets with common properties [RFC3917].  
     
    * Packet Content 
     
       The packet content denotes the union of the packet header  
       (which includes link layer, network layer and other 
       encapsulation headers) and the packet payload.  
        
 3.2 Selection Process 
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 5] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    * Selection Process  
     
       A Selection Process takes the Observed Packet Stream as its 
       input and selects a subset of that stream as its output.  
        
    * Selection State 
     
       A Selection Process may maintain state information for use by 
       the Selection Process. At a given time, the Selection State may 
       depend on packets observed at and before that time, and other 
       variables. Examples include:  
        
         (i)  sequence numbers of packets at the input of Selectors;  
          
         (ii) a timestamp of observation of the packet at the 
              Observation Point; 
          
         (iii) iterators for pseudorandom number generators;  
          
         (iv) hash values calculated during selection;  
          
         (v)  indicators of whether the packet was selected by a given 
              Selector;  
          
       Selection Processes may change portions of the Selection State 
       as a result of processing a packet. Selection state for a packet 
       is to reflect the state after processing the packet.  
     
    * Selector 
     
       A Selector defines the action of a Selection Process on a single 
       packet of its input. If selected, the packet becomes an element 
       of the output Packet Stream. 
        
       The Selector can make use of the following information in 
       determining whether a packet is selected:  
        
         (i)  the packet's content; 
          
         (ii) information derived from the packet's treatment at the 
              Observation Point; 
          
         (iii) any selection state that may be maintained by the 
              Selection Process. 
          
    * Composite Selector 
     
       A Composite Selector is an ordered composition of Selectors, in 
       which the output Packet Stream issuing from one Selector forms 
       the input Packet Stream to the succeeding Selector. 
     
    * Primitive Selector 
     
       A Selector is primitive if it is not a Composite Selector. 
        
    * Selection Sequence 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 6] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


       From all the packets observed at an Observation Point, only a 
       few packets are selected by one or more Selectors.  The 
       Selection Sequence is a unique value per Observation Domain 
       describing the Observation Point and the Selector IDs through 
       which the packets are selected. 
        
 3.3 Reporting 
  
    * Packet Reports 
     
       Packet Reports comprise a configurable subset of a packet's 
       input to the Selection Process, including the packet's content, 
       information relating to its treatment (for example, the output 
       interface), and its associated selection state (for example, a 
       hash of the packet's content) 
        
    * Report Interpretation: 
     
       Report Interpretation comprises subsidiary information, relating 
       to one or more packets, that is used for interpretation of their 
       packet reports. Examples include configuration parameters of the 
       Selection Process.  
     
    * Report Stream:  
     
       The Report Stream is the output of a Metering Process, 
       comprising two distinguished types of information: Packet 
       Reports, and Report Interpretation. 
  
 3.4 Metering Process 
     
       A Metering Process selects packets from the Observed Packet 
       Stream using a Selection Process, and produces as output a 
       Report Stream concerning the selected packets. The PSAMP 
       Metering Process can be viewed as analogous to the IPFIX 
       metering process [IPFIX-PROTO], which produces flow records as 
       its output.  While the Metering Process definition in this 
       document specifies the PSAMP definition, the PSAMP protocol 
       specifications [PSAMP-PROTO] will use the IPFIX Metering Process 
       definition, which also suits the PSAMP requirements.   The 
       relationship between PSAMP and IPFIX is described more in 
       [PSAMP-INFO] and [PSAMP-PROTO]. 
        
 3.5 Exporting Process 
     
    * Exporting Process: 
     
       An Exporting Process sends, in the form of Export Packet, the 
       output of one or more Metering Processes to one or more 
       Collectors. 
     
    * Export Packet: 
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 7] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


       An Export Packet is a combination of Report Interpretation 
       and/or one or more Packet Reports are bundled by the Exporting 
       Process into an Export Packet for exporting to a Collector.  
        
 3.6 PSAMP Device 
     
    * PSAMP Device  
     
       A PSAMP Device is a device hosting at least an Observation 
       Point, a Metering Process and an Exporting Process. Typically, 
       corresponding Observation Point(s), Metering Process(es) and 
       Exporting Process(es) are co-located at this device, for example 
       at a router. 
     
 3.7 Collector 
  
    * Collector  
     
       A Collector receives a Report Stream exported by one or more 
       Exporting Processes. In some cases, the host of the Metering 
       and/or Exporting Processes may also serve as the Collector. 
     
 3.8 Selection Methods 
     
    * Filtering 
       A filter is a Selector that selects a packet deterministically 
       based on the Packet Content, or its treatment, or functions of 
       these occurring in the Selection State.  Two examples are: 
     
         (i) Property match filtering: a packet is selected if a 
              specific field in the packet equals a predefined value. 
          
         (ii) Hash-based selection: a hash function is applied to the 
              Packet Content, and the packet is selected if the result 
              falls in a specified range. 
             
    * Sampling  
        
       A selector that is not a filter is called a sampling operation.  
       This reflects the intuitive notion that if the selection of a 
       packet cannot be determined from its content alone, there must 
       be some type of sampling taking place. Sampling operations can 
       be divided into two subtypes: 
        
          (i) Content-independent sampling, which does not use Packet 
              Content in reaching sampling decisions.  Examples 
              include systematic sampling, and uniform pseudorandom 
              sampling driven by a pseudorandom number whose 
              generation is independent of Packet Content.  Note that 
              in Content-independent Sampling it is not necessary to 
              access the Packet Content in order to make the selection 
              decision. 
          
         (ii) Content-dependent sampling, in which the Packet Content 
              is used in reaching selection decisions.  An application 
              is pseudorandom selection according to a probability 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 8] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


              that depends on the contents of a packet field, e.g., 
              sampling packets with a probability dependent on their 
              TCP/UDP port numbers.  Note that this is not a Filter. 
  
    * Hash Domain 
  
       A subset of the Packet Content and the packet treatment, viewed 
       as an N-bit string for some positive integer N. 
        
    * Hash Range 
  
       A set of M-bit strings for some positive integer M that define 
       the range of values the result of the hash operation can take. 
     
    * Hash Function 
  
       A deterministic map from the Hash Domain into the Hash Range. 
        
    * Hash Selection Range 
  
       A subset of the Hash Range. The packet is selected if the action 
       of the Hash Function on the Hash Domain for the packet yields a 
       result in the Hash Selection Range. 
        
    * Hash-based Selection 
  
       Filtering specified by a Hash Domain, a Hash Function, and Hash 
       Range and a Hash Selection Range. 
        
        
    * Approximative Selection 
  
       Selectors in any of the above categories may be approximated by 
       operations in the same or another category for the purposes of 
       implementation. For example, uniform pseudorandom Sampling may 
       be approximated by Hash-based Selection, using a suitable Hash 
       Function and Hash Domain. In this case, the closeness of the 
       approximation depends on the choice of Hash Function and Hash 
       Domain. 
        
    * Population 
     
       A Population is a Packet Stream, or a subset of a Packet Stream. 
       A Population can be considered as a base set from which packets 
       are selected. An example is all packets in the Observed Packet 
       Stream that are observed within some specified time interval. 
        
    * Population Size 
  
       The Population Size is the number of all packets in the 
       Population. 
        
    * Sample Size 
  
       The number of packets selected from the Population by a 
       Selector. 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 9] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    * Configured Selection Fraction 
        
       The Configured Selection Fraction is the ratio of the number of 
       packets selected by a Selector from an input Population, to the 
       Population Size, as based on the configured selection 
       parameters. 
        
    * Attained Selection Fraction 
        
       The Attained Selection Fraction is the actual ratio of the 
       number of packets selected by a Selector from an input 
       Population, to the Population Size.  
        
    For some sampling methods the Attained Selection Fraction can 
    differ from the Configured Selection Fraction due to, for example, 
    the inherent statistical variability in sampling decisions of 
    probabilistic Sampling and Hash-based Selection. Nevertheless, for 
    large Population Sizes and properly configured Selectors, the 
    Attained Selection Fraction usually approaches the Configured 
    Selection Fraction. 
     
 4. Categorization of Packet Selection Techniques 
  
    Packet selection techniques generate a subset of packets from an 
    Observed Packet Stream at an Observation Point. We distinguish 
    between Sampling and Filtering. 
  
    Sampling is targeted at the selection of a representative subset of 
    packets. The subset is used to infer knowledge about the whole set 
    of observed packets without processing them all. The selection can 
    depend on packet position, and/or on packet content, and/or on 
    (pseudo) random decisions.  
  
    Filtering selects a subset with common properties. This is used if 
    only a subset of packets is of interest. The properties can be 
    directly derived from the packet content, or depend on the 
    treatment given by the router to the packet. Filtering is a 
    deterministic operation. It depends on packet content or router 
    treatment. It never depends on packet position or on (pseudo) 
    random decisions. 
     
    Note that a common technique to select packets is to compute a Hash 
    Function on some bits of the packet header and/or content and to 
    select it if the Hash Value falls in the Hash Selection Range. 
    Since hashing is a deterministic operation on the packet content, 
    it is a Filtering technique according to our categorization. 
    Nevertheless, Hash Functions are sometimes used to emulate random 
    Sampling. Depending on the chosen input bits, the Hash Function and 
    the Hash Selection Range, this technique can be used to emulate the 
    random selection of packets with a given probability p. It is also 
    a powerful technique to consistently select the same packet subset 
    at multiple Observation Points [DuGr00] 
     
    The following table gives an overview of the schemes described in 
    this document and their categorization. An X in brackets (X) 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 10] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    denotes schemes for which also content-independent variants exist. 
    It easily can be seen that only schemes with both properties, 
    content dependence and deterministic selection, are considered as 
    filters.  
     
           Selection Scheme   | Deterministic | Content- | Category 
                              |  Selection    | dependent|           
      ------------------------+---------------+----------+---------- 
       Systematic             |       X       |     _    | Sampling  
       Count-based            |               |          | 
      ------------------------+---------------+----------+---------- 
       Systematic             |       X       |     -    | Sampling 
       Time-based             |               |          | 
      ------------------------+---------------+----------+---------- 
       Random                 |       -       |     -    | Sampling 
       n-out-of-N             |               |          | 
      ------------------------+---------------+----------+---------- 
       Random                 |       -       |     -    | Sampling 
       Uniform probabilistic  |               |          | 
      ------------------------+---------------+----------+---------- 
       Random                 |       -       |    (X)   | Sampling 
       Non-uniform probabil.  |               |          | 
      ------------------------+---------------+----------+---------- 
       Random                 |       -       |    (X)   | Sampling 
       Non-uniform flow-state |               |          | 
      ------------------------+---------------+----------+---------- 
       Property Match         |       X       |    (X)   | Filtering 
       Filtering              |               |          | 
      ------------------------+---------------+----------+---------- 
       Hash Function          |       X       |     X    | Filtering 
      ------------------------+---------------+----------+---------- 
     
    The categorization just introduced is mainly useful for the 
    definition of an information model describing Primitive Selectors. 
    More complex selection techniques can be described through the 
    composition of cascaded Sampling and Filtering operations. For 
    example, a packet selection that weights the selection probability 
    on the basis of the packet length can be described as a cascade of 
    a Filtering and a Sampling scheme. However, this descriptive 
    approach is not intended to be rigid: if a common and consolidated 
    selection practice turns out to be too complex to be described as a 
    composition of the mentioned building blocks, an ad hoc description 
    can be specified instead and added as a new scheme to the 
    information model. 
  
 5. Sampling 
  
    The deployment of Sampling techniques aims at the provisioning of 
    information about a specific characteristic of the parent 
    population at a lower cost than a full census would demand. In 
    order to plan a suitable Sampling strategy it is therefore crucial 
    to determine the needed type of information and the desired degree 
    of accuracy in advance. 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 11] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    First of all it is important to know the type of metric that should 
    be estimated. The metric of interest can range from simple packet 
    counts [JePP92] up to the estimation of whole distributions of flow 
    characteristics (e.g. packet sizes)[ClPB93]. 
  
    Secondly, the required accuracy of the information and with this, 
    the confidence that is aimed at, should be known in advance. For 
    instance for usage-based accounting the required confidence for the 
    estimation of packet counters can depend on the monetary value that 
    corresponds to the transfer of one packet. That means that a higher 
    confidence could be required for expensive packet flows (e.g. 
    premium IP service) than for cheaper flows (e.g. best effort). The 
    accuracy requirements for validating a previously agreed quality 
    can also vary extremely with the customer demands. These 
    requirements are usually determined by the service level agreement 
    (SLA). 
  
    The Sampling method and the parameters in use must be clearly 
    communicated to all applications that use the measurement data. 
    Only with this knowledge a correct interpretation of the 
    measurement results can be ensured.  
  
    Sampling methods can be characterized by the Sampling algorithm, 
    the trigger type used for starting a Sampling interval and the 
    length of the Sampling interval. These parameters are described 
    here in detail. The Sampling algorithm describes the basic process 
    for selection of samples. In accordance to [AmCa89] and [ClPB93] we 
    define the following basic Sampling processes: 
     
 5.1 Systematic Sampling 
  
    Systematic Sampling describes the process of selecting the start 
    points and the duration of the selection intervals according to a 
    deterministic function. This can be for instance the periodic 
    selection of every k-th element of a trace but also the selection 
    of all packets that arrive at pre-defined points in time. Even if 
    the selection process does not follow a periodic function (e.g. if 
    the time between the Sampling intervals varies over time) we 
    consider this as systematic Sampling as long as the selection is 
    deterministic. 
  
    The use of systematic Sampling always involves the risk of biasing 
    the results. If the systematics in the Sampling process resemble 
    systematics in the observed stochastic process (occurrence of the 
    characteristic of interest in the network), there is a high 
    probability that the estimation will be biased. Systematics in the 
    observed process might not be known in advance. 
  
    Here only equally spaced schemes are considered, where triggers for 
    Sampling are periodic, either in time or in packet count. All 
    packets occurring in a selection interval (either in time or packet 
    count) beyond the trigger are selected. 
  
    Systematic count-based 
 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 12] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    In systematic count-based Sampling the start and stop triggers for 
    the Sampling interval are defined in accordance to the spatial 
    packet position (packet count). 
  
    Systematic time-based 
    In systematic time-based Sampling time-based start and stop 
    triggers are used to define the Sampling intervals. All packets are 
    selected that arrive at the Observation Point within the time-
    intervals defined by the start and stop triggers (i.e. arrival time 
    of the packet is larger than the start time and smaller than the 
    stop time). 
  
    Both schemes are content-independent selection schemes. Content 
    dependent deterministic Selectors are categorized as filter. 
     
 5.2 Random Sampling 
     
    Random Sampling selects the starting points of the Sampling 
    intervals in accordance to a random process. The selection of 
    elements are independent experiments. With this, unbiased 
    estimations can be achieved. In contrast to systematic Sampling, 
    random Sampling requires the generation of random numbers. One can 
    differentiate two methods of random Sampling: 
     
 5.2.1  n-out-of-N Sampling 
  
    In n-out-of-N Sampling n elements are selected out of the parent 
    population that consists of N elements. One example would be to 
    generate n different random numbers in the range [1,N] and select 
    all packets which have a packet position equal to one of the random 
    numbers. For this kind of Sampling the Sample Size n is fixed.  
     
 5.2.2  Probabilistic Sampling 
     
    In probabilistic Sampling the decision whether an element is 
    selected or not is made in accordance to a pre-defined selection 
    probability. An example would be to flip a coin for each packet and 
    select all packets for which the coin showed the head. For this 
    kind of Sampling the Sample Size can vary for different trials. The 
    selection probability does not necessarily has to be the same for 
    each packet. Therefore we distinguish between uniform probabilistic 
    Sampling (with the same selection probability for all packets) and 
    non-uniform probabilistic Sampling (where the selection probability 
    can vary for different packets). 
     
 5.2.2.1 Uniform Probabilistic Sampling 
     
    For Uniform Probabilistic Sampling packets are selected 
    independently with a uniform probability p. This Sampling can be 
    count-driven, and is sometimes referred to as geometric random 
    Sampling, since the difference in count between successive selected 
    packets are independent random variables with a geometric 
    distribution of mean 1/p. A time-driven analog, exponential random 
    Sampling, has the time between triggers exponentially distributed. 
    Both geometric and exponential random Sampling are examples of what 
    is known as additive random Sampling, defined as Sampling where the 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 13] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    intervals or counts between successive samples are independent 
    identically distributed random variable. 
     
 5.2.2.2 Non-Uniform Probabilistic Sampling 
     
    This is a variant of Probabilistic Sampling in which the Sampling 
    probabilities can depend on the selection process input. This can 
    be used to weight Sampling probabilities in order e.g. to boost the 
    chance of Sampling packets that are rare but are deemed important. 
    Unbiased estimators for quantitative statistics are recovered by 
    renormalization of sample values; see [HT52]. 
     
 5.2.2.3 Non-Uniform Flow State Dependent Sampling  
  
    Another type of Sampling that can be classified as probabilistic 
    Non-Uniform is closely related to the flow concept as defined in 
    [RFC3917], and it is only used jointly with a flow monitoring 
    function (IPFIX metering process). Packets are selected, dependent 
    on a selection state. The point, here, is that the selection state 
    is determined also by the state of the flow the packet belongs to 
    and/or by the state of the other flows currently being monitored by 
    the associated flow monitoring function. An example for such an 
    algorithm is the "sample and hold" method described in [EsVa01]: 
     
    - If a packet accounts for a flow record that already exists in 
       the IPFIX flow recording process, it is selected (i.e. the flow 
       record is updated) 
    - If a packet doesn't account to any existing flow record, it is 
       selected with probability p. If it has been selected a new flow 
       record has to be created. 
     
    A further algorithm that fits into the category of non-uniform flow 
    state dependent Sampling is described in [Moli03]. 
     
    This type of Sampling is content dependent because the 
    identification of the flow the packet belongs to requires analyzing 
    part of the packet content. If the packet is selected, then it is 
    passed as an input to the IPFIX monitoring function (this is called 
    "Local Export" in [PSAMP-FW]. Selecting the packet depending on the 
    state of a flow cache is useful when memory resources of the flow 
    monitoring function are scarce (i.e. there is no room to keep all 
    the flows that have been scheduled for monitoring). See [MolF03] 
    for a more detailed description of the motivations for this type of 
    Sampling and the impact on the IPFIX metering. 
     
 5.2.2.4 Configuration of non-uniform probabilistic and flow-state 
       Sampling 
     
    Many different specific methods can be grouped under the terms non-
    uniform probabilistic and flow state Sampling. Dependent on the 
    Sampling goal and the implemented scheme, a different number and 
    type of input parameters is required to configure such scheme. 
     
    Some concrete proposals for such methods exist from the research 
    community (e.g. [EsVa01],[DuLT01],[Moli03]). Some of these 
    proposals are still in an early stage and need further 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 14] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    investigations to prove their usefulness and applicability. It is 
    not our aim to indicate preference amongst these methods. Instead, 
    we only describe here the basic methods and leave the specification 
    of explicit schemes and their parameters up to vendors (e.g. as 
    extension of the information model). 
     
 6. Filtering  
     
    Filtering is the deterministic selection of packets based on the 
    packet content, the treatment of the packet at the Observation 
    Point, or deterministic functions of these occurring in the 
    selection state. The packet is selected if these quantities fall 
    into a specified range. The role of Filtering, as the word itself 
    suggest, is to separate all the packets having a certain property 
    from those not having it. A distinguishing characteristic from 
    Sampling is that the selection decision does not depend on the 
    packet position in time or in the space, or on a random process. 
    We identify and describe in the following two Filtering techniques.  
     
 6.1 Property Match Filtering 
     
    With this Filtering method a packet is selected if specific fields 
    within the packet and/or properties of the router state equal a 
    predefined value. Possible filter fields are all IPFIX flow 
    attributes specified in [IPFIX-INFO]. Further fields can be defined 
    by vendor specific extensions. 
        
    A packet is selected if Field=Value. Masks and ranges are only 
    supported to the extent to which [IPFIX-INFO] allows them e.g. by 
    providing explicit fields like the netmasks for source and 
    destination addresses. 
     
    AND operations are possible by concatenating filters, thus 
    producing a composite selection operation.  In this case, the 
    ordering in which the filtering happens is implicitly defined 
    (outer filters come after inner filters).  However, as long as the 
    concatenation is on filters only, the result of the cascaded filter 
    is independent from the order, but the order may be important for 
    implementation purposes, as the first filter will have to work at a 
    higher rate.  In any case, an implementation is not constrained to 
    respect the filter ordering, as long as the result is the same, and 
    it may even implement the composite filtering in filtering in one 
    single step. 
        
    OR operations are not supported with this basic model.  More 
    sophisticated filters (e.g. supporting bitmasks, ranges or OR 
    operations etc.) can be realized as vendor specific schemes. 
        
    Property match operations should be available for different 
    protocol portions of the packet header: 
     
          (i) the IP header (excluding options in IPv4, stacked headers 
              in IPv6) 
          
         (ii) transport header 
        

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 15] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


         (iii) encapsulation headers (e.g. the MPLS label stack, if 
               present) 
        
    When the PSAMP Device offers property match filtering, and, in its 
    usual capacity other than in performing PSAMP functions, identifies 
    or processes information from IP, transport or encapsulation 
    protocols, then the information should be made available for 
    filtering.  For example, when a PSAMP Device routes based on 
    destination IP address, that field should be made available for 
    filtering.  Conversely, a PSAMP Device that does not route is not 
    expected to be able to locate an IP address within a packet, or 
    make it available for Filtering, although it may do so. 
        
    Since packet encryption conceals the real values of encrypted 
    fields, property match filtering must be configurable to ignore 
    encrypted packets, when detected. 
     
    The Selection Process may support filtering based on the properties 
    of the router state: 
        
         (i)  Ingress interface at which packet arrives equals a 
              specified value 
          
         (ii) Egress interface to which packet is routed to equals a 
              specified value 
          
         (iii) Packet violated Access Control List (ACL) on the router 
          
         (iv)  Failed Reverse Path Forwarding (RPF) 
          
         (v)  Failed Resource Reservation (RSVP) 
          
         (vi)  No route found for the packet 
          
         (vii) Origin Border Gateway Protocol (BGP) Autonomous System 
              (AS) [RFC4271] equals a specified value or lies within a 
              given range 
         (viii)Destination BGP AS equals a specified value or lies 
              within a given range 
        
    Router architectural considerations may preclude some information 
    concerning the packet treatment being available at 
    line rate for selection of packets.  For example, the Selection 
    Process may not be implemented in the fast path that is able to 
    access routing state at line rate.  However, when filtering follows 
    sampling (or some other selection operation) in a Composite 
    Selector, the rate of the Packet Stream output from the sampler and 
    input to the filter may be sufficiently slow that the filter could 
    select based on routing state. 
     
 6.2 Hash-based Filtering 
     
    A Hash Function h maps the Packet Content c, or some portion of it, 
    onto a Hash Range R. The packet is selected if h(c) is an element 
    of S, which is a subset of R called the Hash Selection Range. Thus 
    Hash-based Selection is indeed a particular case of Filtering: the 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 16] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    object is selected if c is in inv(h(S)). But for desirable Hash 
    Functions the inverse image inv(h(S)) will be extremely complex, 
    and hence h would not be expressible as, say, a Property Match 
    Filter or a simple combination of these. 
     
    Hash-based selection has mainly two types of usage: it offers a way 
    to approximate random Sampling by using packet content to generate 
    pseudorandom variates, and a way to consistently select subsets of 
    packets that share a common property (e.g. at different Observation 
    Points). 
     
    In the following subsections we give more details about them. 
    However, both usages require that the Hash Functions has two 
    statistical properties. 
     
    First, the Hash Function h must have good mixing properties, in the 
    sense that small changes in the input (e.g. the flipping of a 
    single bit) cause large changes in the output (many bits change). 
    Then any local clump of values of c is spread widely over R by h, 
    and so the distribution of h(c) is fairly uniform even if the 
    distribution of c is not. Then the Sampling Fraction is #S/#R, 
    which can be tuned by choice of S.  
     
    The second desirable property depends more closely on the 
    statistics of the content c. In applications, the content c 
    comprises a number of distinct fields, c1 ... cm, e.g. source and 
    destination IP Address, IP identification, and TCP/UDP port numbers 
    (if present) for a packet. With a Hash Function satisfying the 
    first properties above, selection decisions will appear 
    uncorrelated with the contents of any individual field, if the 
    complementary fields are (i) sufficiently variable themselves, and 
    (ii) sufficiently uncorrelated with cj. 
     
 6.2.1  Application Examples for Hash-based Selection 
     
 6.2.1.1 Approximation of Random Sampling 
     
    Although pseudorandom number generators with well understood 
    properties have been developed, they may not be the method of 
    choice in settings where computational resources are scarce. A 
    convenient alternative is to use Hash Functions of packet content 
    as a source of randomness. The hash (suitably renormalized) is a 
    pseudorandom variate in the interval [0,1]. Other schemes may use 
    packet fields in iterators for pseudorandom numbers. However, the 
    statistical properties of an ideal packet selection law (such as 
    independent Sampling for different packets, or independence on 
    packet content) may not be exactly rendered by an implementation, 
    but only approximately so. 
     
    Use of packet content to generate pseudorandom variates shares with 
    Non-uniform Probabilistic Sampling (see Section 3.1.2.2.2 above) 
    the property that selection decisions depend on Packet Content. 
    However, there is a fundamental difference between the two. In the 
    former case the content determines pseudorandom variates. In the 
    latter case the content only determines the selection 
 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 17] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    probabilities: selection could then proceed e.g by use of random 
    variates obtained by an independent pseudorandom number generator.  
     
 6.2.1.2 Trajectory Sampling and Consistent Packet Selection. 
     
    Trajectory Sampling is the consistent selection of a subset of 
    packets at either all of a set of Observation Points or none of 
    them. Trajectory Sampling is realized by Hash-based Selection if 
    all Observation Points in the set use a common Hash Function, Hash 
    Domain and selection range. The Hash Domain comprises all or part 
    of the packet content that is invariant along the packet path. 
    Fields such as Time-to-Live, which is decremented per hop, and 
    header CRC, which is recalculated per hop, are thus excluded from 
    the Hash Domain. The Hash Domain needs to be wider than just a flow 
    key, if packets are to be selected quasirandomly within flows. 
  
    The trajectory (or path) followed by a packet is reconstructed from 
    PSAMP reports on it that reach a Collector. Reports on a given 
    packet originating from different observations points are 
    associated by matching a label from the reports. The label may 
    comprise that portion invariant packet content that is reported, or 
    possibly some digest of the invariant packet content that is 
    inserted into the packet report at the Observation Point. Such a 
    digest may be constructed by applying a second Hash Function 
    (distinct from that used for selection) to the invariant packet 
    content. The reconstruction of trajectories, and methods for 
    dealing with possible ambiguities due to label collisions 
    (identical labels reported for different packets) and potential 
    loss of reports in transmission, are dealt with in [DuGr00], 
    [DuGG02] and [DuGr04]. 
  
    Applications of trajectory Sampling include (i) estimation of the 
    network path matrix, i.e., the traffic intensities according to 
    network path, broken down by flow key; (ii) detection of routing 
    loops, as indicated by self-intersecting trajectories; (iii) 
    passive performance measurement: prematurely terminating 
    trajectories indicate packet loss, packet one way delay can be 
    determined if reports include (synchronized) timestamps of packet 
    arrival at the Observation Point; (iv) network attack tracing, of 
    the actual paths taken by attack packets with spoofed source 
    addresses. 
  
 6.2.2  Security Considerations for Hash Functions 
       
    A concern for Hash-based Selection is whether some large set of 
    related packets could be disproportionately sampled, i.e., have an 
    Attained Sampling Fraction significantly different from the 
    Configured Sampling Fraction, either (i) through unanticipated 
    behavior in the Hash Function, or (ii) because the packets had been 
    deliberately crafted to have this property.  
          
    The first point underlines the importance of using a Hash Function 
    with good mixing properties. The statistical properties of 
    candidate Hash Functions need to be evaluated, preferably on packet 
    traces before adoption for hash-based Sampling. However, hash 
    functions which perform well on typical traffic may not be 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 18] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    sufficiently strong to withstand attacks specifically targeted 
    against them. As detailed in the following section, only 
    cryptographic hash functions employing a private parameter 
    operating in pseudo-random function mode are sufficiently strong to 
    withstand the range of conceivable attacks.   For example, fixed or 
    variable length inputs could be hashed using a block cipher (like 
    AES) in cipher-block-chaining mode.  Fixed length inputs could also 
    be hashed using an iterated cryptographic hash function (like MD5 
    or SHA1), with a private initial vector.  For variable length 
    inputs, iterated cryptographic hash function (like MD5 or SHA1) 
    should employ private string post-pended to the data in addition to 
    a private initial vector. For more details, see the "append-
    cascade" construction of [BeCK96]. 
     
    The following assumes that the hash function is public and hence 
    known to an attacker. An attacker uses its knowledge of the hash 
    function to craft packets which are then dispatched, either as the 
    attack itself, or to elicit further information which can be used 
    to refine the attack. Thus two scenarios are considered. In the 
    first scenario, the attacker has no knowledge about whether the 
    crafted packets are selected or not. In the second scenario the 
    attacker uses some knowledge of sampling outcomes; the means by 
    which this might be acquired is discussed below. Some attacks that 
    involve tampering with export packets in transit, as opposed to 
    attacking the PSAMP device, are discussed in [GoRe07]. 
  
 6.2.2.1 Vulnerabilities of Hash-based selection without knowledge of 
       selection outcomes. 
  
    (i) The hash function does not use a private parameter.  
     
    If the selection range is public, an attacker can craft packets 
    whose selection properties are known in advance. If the selection 
    range is private, an attacker cannot determine whether a crafted 
    packet is selected. However by computing the hash on different 
    trial crafted packets, and selecting those yielding a given hash 
    value, the attacker can construct an arbitrarily large set of 
    distinct packets with a common selection properties, i.e., packets 
    that will be either all selected or all not selected. This can be 
    done whatever the strength of the hash function.  
     
    (ii) The hash function is not cryptographically strong. 
     
    If the hash function is not cryptographically strong, it may still 
    be possible to construct sequences of distinct packets with the 
    common selection property. An example is the standard CRC-32 hash 
    function used with a private modulus (but without a private string 
    post-pended to the input). It has weak mixing properties for low 
    order bits. Consequently, simply by incrementing the hash input, 
    one obtains distinct packets whose hashes mostly fall in a narrow 
    range, and hence are likely commonly selected; see [GoRe07] 
     
    Suitable parameterization of the hash function can make such 
    attacks more difficult. For example, post-pending a private string 
    to the input before hashing with CRC-32 will give stronger mixing 
    properties over all bits of the input. However, with a hash 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 19] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    function, such as CRC-32, that is not cryptographically strong, the 
    possibility of discovering a method to construct packet sets with 
    the common selected property cannot be ruled out, even when a 
    private modulus or post-pended string is used.  
     
 6.2.2.2 Vulnerabilities of Hash-based selection using knowledge of 
       selection outcomes. 
     
    Knowledge of the selection outcomes of crafted packets can by used 
    by an attacker to more easily construct sets of packets which are 
    disproportionately sampled and/or are commonly selected. There are 
    several ways an attacker might acquire this knowledge: 
     
    (i) Billing Reports: if samples are used for billing purposes, then 
    the selection outcomes of packets may be able to be inferred by 
    correlating a crafted packet stream with the billing reports that 
    it generates. However, the rate at knowledge of selection outcomes 
    can be acquired depends on the temporal and spatial granularity of 
    the billing reports, being slower the more aggregated the reports 
    are. 
     
    (ii) Feedback from an Intrusion Detection System: e.g., a botmaster 
    adversary learns if his packets were detected by the intrusion 
    detection system by seeing if one of his bots is blocked by the 
    network. 
     
    (iii) Observation of the Report Stream: export packets sent across 
    a public network may be eavesdropped on by an adversary. Encryption 
    of the export packets provides only a partial defense, since it may 
    be possible to infer the selection outcomes of packets by 
    correlating a crafted packet stream with the occurrence (not the 
    content) of packets in the export stream that it generates. The 
    rate at which such knowledge could be acquired is limited by the 
    temporal resolution at which reports can be associated with 
    packets, e.g. due to processing and propagation variability, and 
    difficulty in distinguishing report on attack packets from those of 
    background traffic, if present. The association between packets and 
    their reports on which this depends could be removed by padding 
    export packets to a constant length and sending them at a constant 
    rate. 
     
    We now turn to attacks that can exploit knowledge of selection 
    outcomes. Firstly, with a non-cryptographic hash function, 
    knowledge of selection outcomes for a trial stream may be used to 
    further craft a packet set with the common selection property. This 
    has been demonstrated for the modular hash f(x) = a x + b mod k, 
    for private parameters a, b, and k. With sampling rate p, knowledge 
    of the sampling outcomes of roughly 2/p is sufficient for the 
    attack to succeed, independent of the values of a, b and k. With 
    knowledge of the selection outcomes of a larger number of packets, 
    the parameters a b and k can be determined; see [GoRe07]. 
     
    A cryptographic hash function employing a private parameter and 
    operating in one of the pseudo-random function modes specified 
    above is not vulnerable to these attacks, even if the selection 
    range is known. 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 20] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


 6.2.2.3 Vulnerabilities to Replay Attacks 
     
    Since hash-based selection is deterministic, any packet or set of 
    packets with known selection properties can be replayed into a 
    network and experience the same selection outcomes provide the hash 
    function and its parameters are not changed. Repetition of a single 
    packet may be noticeable to other measurement methods if employed 
    (e.g. collection of flow statistics), whereas a set of distinct 
    packets that appears statistically similar to regular traffic may 
    be less noticeable.  
     
    Replay attacks may be mitigated by repeated changing of hash 
    function parameters. This also prevents attacks that exploit 
    knowledge of sampling outcomes, at least if the parameters are 
    changed at least as fast as the knowledge can be acquired by an 
    attacker. In order to preserve the ability to perform Trajectory 
    Sampling, parameter changed would have to be simultaneous (or 
    approximately so) across all observation point. 
  
 6.2.3  Choice of Hash-Function 
  
    The specific choice of hash function represents a trade-off between 
    complexity and ease of implementation. Ideally, a cryptographically 
    strong hash function employing a private parameter and operating in 
    pseudorandom function mode as specified above would be used, 
    yielding a good emulation a random packet selection at a target 
    sampling rate, and giving maximal robustness against the attacks 
    described in the previous section. However, it is not assumed that 
    all PSAMP devices will be capable of applying a cryptographically 
    strong hash function to every packet at line rate. For this reason, 
    the hash functions listed in this section will be of a weaker 
    variety. Future protocol extensions that employ stronger hash 
    functions are not precluded. 
     
 6.2.3.1 Properties of some hash functions. 
  
    This document recommends 3 hash functions: IPSX, BOB and CRC-32. 
    Specifications of IPSX and BOB are in the appendix; the CRC-32 
    function is described in [crc32]. None of these hash functions is 
    recommended for cryptographic purposes. A comparison of hash-
    functions with regard to execution speed, collision probability, 
    uniformity of the distribution of values in the Hash 
    Range and the speed of the functions is described in [MoND05].  
     
    (i) Speed: IPSX is simple to implement and was correspondingly 
    about an order of magnitude faster to execute per packet than BOB 
    or CRC-32. 
     
    (ii) Uniformity: All three hash functions evaluated showed 
    relatively poor uniformity with 16 byte input that was drawn from 
    only invariant fields in the IP and TCP/UDP headers (i.e. header 
    fields that do not change from hop to hop). IPSX is inherently 
    limited to 16 bytes. BOB and 
    CRC-32 exhibits noticeably better uniformity when 4 or more bytes 
    from the payload are also included in the input. Although the 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 21] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    uniformity has been checked for different traffic traces, results 
    cannot be generalized to arbitrary traffic. Since hash-based 
    selection is a deterministic function on the packet content, it can 
    always be biased towards packets with specific attributes. 
    Furthermore, it should be noted that all Hash Functions were 
    evaluated only for IPv4.  
       
 6.2.3.2 Hash Functions for Packet Selection 
  
    The BOB function SHOULD be used for packet selection operations. 
    Both the parameter (the init value) and the selection range should 
    be kept private. Other functions, such as CRC-32 and IPSX MAY be 
    used. If CRC-32 is used, the input should first be post-pended with 
    a private string that acts as a parameter, and the modulus of the 
    CRC should also be kept private. 
     
    Input bytes for the Hash Function need to be invariant along the 
    path the packet is traveling. Only with this it is ensured that the 
    same packets are selected at different observation points. 
    Furthermore they should have a high variability between different 
    packets to generate a high variation in the Hash Range.  
     
    If a hash-based selection with the BOB function is used with IPv4 
    traffic, the following input bytes MUST be used. 
  
    - IP identification field 
    - Flags field 
    - Fragment offset 
    - Source IP address  
    - Destination IP address 
    - A configurable number of bytes from the IP payload, starting at 
       a configurable offset.  
     
    All investigated Hash Functions were evaluated only for IPv4. Due 
    to the IPv6 header fields and address structure it is expected that 
    there is less randomness in IPv6 packet headers than in IPv4 
    headers. Nevertheless, the randomness of IPv6 traffic was not 
    evaluated in the tests mentioned above. In addition to this, IPv6 
    traffic profiles may change significantly in future when IPv6 is 
    used by a broader community. If a hash-based selection with the BOB 
    function is used with IPv6 traffic, the following input bytes MUST 
    be used. 
  
    - Payload length (2 bytes)  
    - Byte number 10,11,14,15,16 of the IPv6 source address 
    - Byte number 10,11,14,15,16 of the IPv6 destination address 
    - A configurable number of bytes from the IP payload, starting at 
       a configurable offset. It is recommended to use at least 4 bytes 
       from the IP payload. 
  
    The payload itself is not changing during the path. Even if some 
    routers process some extension headers they are not going to strip 
    them from the packet. Therefore the payload length is invariant 
    along the path. Furthermore it usually differs for different 
    packets. The IPv6 address has 16 bytes. The first part is the 
    network part and it contains low variation. The second part is the 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 22] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    host part and contains higher variation. Therefore the second part 
    of the address is used. Nevertheless, the uniformity has not been 
    checked for IPv6 traffic. 
  
 6.2.3.3 Hash Functions Suitable for Packet Digesting 
  
    For digesting Packet Content for inclusion in a reported label, the 
    most important property is a low collision frequency. A secondary 
    requirement is the ability to accept variable length input, in 
    order to allow inclusion of maximal amount of packet as input. 
    Execution speed is of secondary importance, since the digest need 
    only be formed from selected packets.  
     
    For this purpose also the BOB function is recommended. Other 
    functions (such as CRC-32) MAY be used. Among the functions capable 
    of operating with variable length input BOB and CRC-32 have the 
    fastest execution, BOB being slightly faster. IPSX is not 
    recommended for digesting because it has a significantly higher 
    collision rate and takes only a fixed length input. 
  
 7. Parameters for the Description of Selection Techniques 
  
    This section gives an overview of different alternative selection 
    schemes and their required parameters. In order to be compliant 
    with PSAMP at least one of proposed schemes MUST be implemented. 
     
    The decision whether to select a packet or not is based on a 
    function which is performed when the packet arrives at the 
    selection process. Packet selection schemes differ in the input 
    parameters for the selection process and the functions they require 
    to do the packet selection. The following table gives an overview. 
  
         Scheme       |   input parameters     |     functions  
       ---------------+------------------------+------------------- 
        systematic    |    packet position     |  packet counter  
        count-based   |    Sampling pattern    |  
       ---------------+------------------------+------------------- 
        systematic    |      arrival time      |  clock or timer 
        time-based    |     Sampling pattern   | 
       ---------------+------------------------+------------------- 
        random        |  packet position       |  packet counter, 
        n-out-of-N    |  Sampling pattern      |  random numbers 
                      | (random number list)   | 
       ---------------+------------------------+------------------- 
        uniform       |        Sampling        |  random function 
        probabilistic |      probability       |    
       ---------------+------------------------+------------------- 
        non-uniform   |e.g. packet position,   | selection function, 
        probabilistic |  packet content(parts) |  probability calc. 
       ---------------+------------------------+------------------- 
        non-uniform   |e.g. flow state,        | selection function, 
        flow-state    |  packet content(parts) |  probability calc. 
       ---------------+------------------------+------------------- 
        property      | packet content(parts)  |  filter function or 
        match         | or router state        |  state discovery 
       ---------------+------------------------+------------------- 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 23] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


        hash-based    |  packet content(parts) |  Hash Function 
       ---------------+------------------------+------------------- 
     
 7.1 Description of Sampling Techniques 
     
    In this section we define what elements are needed to describe the 
    most common Sampling techniques. Here the selection function is 
    pre-defined and given by the Selector ID.  
     
    Sampler Description: 
         SELECTOR_ID 
         SELECTOR_TYPE 
         SELECTOR_PARAMETERS 
  
    Where: 
     
    SELECTOR_ID: 
    Unique ID for the packet sampler. 
  
    SELECTOR_TYPE 
    For Sampling processes the SELECTOR TYPE defines what Sampling 
    algorithm is used. 
    Values: Systematic Count-based | Systematic Time-based | Random n-
    out-of-N | Uniform Probabilistic | Non-uniform Probabilistic | Non-
    uniform Flow-state 
  
    SELECTOR_PARAMETERS 
    For Sampling processes the SELECTOR PARAMETERS define the input 
    parameters for the process. Interval length in systematic Sampling 
    means, that all packets that arrive in this interval are selected. 
    The spacing parameter defines the spacing in time or number of 
    packets between the end of one Sampling interval and the start of 
    the next succeeding interval. 
  
    Case n out of N: 
       - Population size N, Sample size n 
     
    Case Systematic Time Based: 
       - Interval length (in usec), Spacing (in usec) 
     
    Case Systematic Count Based: 
       - Interval length(in packets), Spacing (in packets) 
     
    Case Uniform Probabilistic (with equal probability per packet): 
       - Sampling probability p 
        
    Case Non-uniform Probabilistic: 
       - Calculation function for Sampling probability p (see also 
          section 5.2.2.4) 
     
    Case flow state: 
       - Information reported for flow state can be found in 
          [MolF03](see also section 5.2.2.4) 
        
 7.2 Description of Filtering Techniques 
    

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 24] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    In this section we define what elements are needed to describe the 
    most common Filtering techniques. The structure closely parallels 
    the one presented for the Sampling techniques. 
     
    Filter Description: 
         SELECTOR_ID 
         SELECTOR_TYPE 
         SELECTOR_PARAMETERS 
  
    Where: 
     
    SELECTOR_ID: 
    Unique ID for the packet filter. The ID can be calculated under 
    consideration of the SELECTION SEQUENCE and a local ID. 
     
    SELECTOR_TYPE 
    For Filtering processes the SELECTOR TYPE defines what Filtering 
    type is used. 
    Values: Matching | Hashing | Router_state 
     
    SELECTOR_PARAMETERS 
    For Filtering processes the SELECTOR PARAMETERS define formally the 
    common property of the packet being filtered. For the filters of 
    type Matching and Hashing the definitions have a lot of points in 
    common. 
     
    Values: 
     
    Case Matching 
       - Information Element (from [IPFIX-INFO]) 
       - Value (type in accordance to [IPFIX-INFO]) 
  
    In case of multiple match criteria, multiple "case matching" have 
    to be bound by a logical AND. 
  
    Case Hashing: 
       - Hash Domain (Input bits from packet) 
           - <Header type = ipv4> 
           - <Input bit specification, header part> 
           - <Header type =  ipv6> 
           - <Input bit specification, header part> 
           - <payload byte number N> 
           - <Input bit specification, payload part> 
       - Hash Function  
           - Hash function name  
           - Length of input key (eliminate 0x bytes) 
           - Output value (length M and bitmask) 
           - Hash Selection Range, as a list of non overlapping 
              intervals [start value, end value] where value is in 
              [0,2^M-1] 
           - Additional parameters dependent on specific Hash Function 
              (e.g. hash input bits (seed)) 
     
    Notes to input bits for Case Hashing: 
       - Input bits can be from header part only, from the payload 
         part only or from both. 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 25] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


       - The bit specification, for the header part, can be specified 
          for ipv4 or ipv6 only, or both 
       - In case of ipv4, the bit specification is a sequence of 20 
          Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask to 
          be applied to the header. 
       - In case of ipv6, it is a sequence of 40 Hexadecimal numbers 
          [00,FF] specifying a 40 bytes bitmask to be applied to the 
          header 
       - The bit specification, for the payload part, is a sequence of 
          Hexadecimal numbers [00,FF] specifying the bitmask to be 
          applied to the first N bytes of the payload, as specified by 
          the previous field. In case the Hexadecimal number sequence 
          is longer then N, only the first N numbers are considered. 
       - In case the payload is shorter than N, the Hash Function 
          cannot be applied. Other options, like padding with zeros, 
          may be considered in the future. 
       - A Hash Function cannot be defined on the options field of the 
          ipv4 header, neither on stacked headers of ipv6. 
       - The Hash Selection Range defines a range of hash-values (out 
          of all possible results of the Hash-Operation). If the hash 
          result for a specific packet falls in this range, the packet 
          is selected. If the value is outside the range, the packet is 
          not selected. E.g. if the selection interval specification is 
          [1:3], [6:9] all packets are selected for which the hash 
          result is 1,2,3,6,7,8, or 9. In all other cases the packet is 
          not selected. 
  
    Case Router State: 
  
       - Ingress interface at which the packet arrives equals a 
          specified value 
       - Egress interface to which the packet is routed equals a 
          specified value 
       - Packet violated Access Control List (ACL) on the router 
       - Reverse Path Forwarding (RPF) failed for the packet 
       - Resource Reservation is insufficient for the packet 
       - No route found for the packet 
       - Origin AS equals a specified value or lies within a given  
          range 
       - Destination AS equals a specified value or lies within a 
          given range 
  
    Note to Case Router State: 
       - All Router state entries can be linked by AND operators 
  
 8. Composite Techniques  
     
    Composite schemes are realized by combining the selector IDs into a 
    Selection Sequence. The Selection Sequence contains all selector 
    IDs that are applied to the packet stream subsequently. Some 
    examples of composite schemes are reported below. 
     
 8.1 Cascaded Filtering->Sampling or Sampling->Filtering 
  
    If a filter precedes a Sampling process the role of Filtering is to 
    create a set of "parent populations" from a single stream that can 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 26] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    then be fed independently to different Sampling functions, with 
    different parameters tuned for the population itself (e.g. if 
    streams of different intensity result from Filtering, it may be 
    good to have different Sampling rates). If Filtering follows a 
    Sampling process, the same Sampling Fraction and type is applied to 
    the whole stream, independently of the relative size of the streams 
    resulting from the Filtering function. Moreover, also packets not 
    destined to be selected in the Filtering operation will "load" the 
    Sampling function. So, in principle, Filtering before Sampling 
    allows a more accurate tuning of the Sampling procedure, but if 
    filters are too complex to work at full line rate (e.g. because 
    they have to access router state information), Sampling before 
    Filtering may be a need. 
     
 8.2 Stratified Sampling 
     
    Stratified Sampling is one example for using a composite technique. 
    The basic idea behind stratified Sampling is to increase the 
    estimation accuracy by using a-priori information about 
    correlations of the investigated characteristic with some other 
    characteristic that is easier to obtain. The a-priori information 
    is used to perform an intelligent grouping of the elements of the 
    parent population. In this manner, a higher estimation accuracy can 
    be achieved with the same Sample Size or the Sample Size can be 
    reduced without reducing the estimation accuracy. 
     
    Stratified Sampling divides the Sampling process into multiple 
    steps. First, the elements of the parent population are grouped 
    into subsets in accordance to a given characteristic. This grouping 
    can be done in multiple steps. Then samples are taken from each 
    subset.  
     
    The stronger the correlation between the characteristic used to 
    divide the parent population (stratification variable) and the 
    characteristic of interest (for which an estimate is sought after), 
    the easier is the consecutive Sampling process and the higher is 
    the stratification gain. For instance, if the dividing 
    characteristic were equal to the investigated characteristic, each 
    element of the sub-group would be a perfect representative of that 
    characteristic. In this case it would be sufficient to take one 
    arbitrary element out of each subgroup to get the actual 
    distribution of the characteristic in the parent population. 
    Therefore stratified Sampling can reduce the costs for the Sampling 
    process (i.e. the number of samples needed to achieve a given level 
    of confidence). 
  
    For stratified Sampling one has to specify classification rules for 
    grouping the elements into subgroups and the Sampling scheme that 
    is used within the subgroups. The classification rules can be 
    expressed by multiple filters. For the Sampling scheme within the 
    subgroups the parameters have to be specified as described above. 
    The use of stratified Sampling methods for measurement purposes is 
    described for instance in [ClPB93] and [Zseb03]. 
     
 9. Security Considerations 
  

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 27] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    Security considerations concerning the choice of sampling hash 
    function have been discussed in Section 6.2.2. That section 
    discussed a number of potential attacks to craft packet streams 
    which are disproportionately detected and/or discover the hash 
    function parameters, the vulnerabilities of different hash 
    functions to these attacks, and practices to minimize these 
    vulnerabilities. 
     
    Further security threats can occur if the configuration of Sampling 
    parameters or the communication of Sampling parameters to the 
    application is corrupted. This document only describes Sampling 
    schemes that can be used for packet selection. It neither describes 
    a mechanism how those parameters are configured nor how these 
    parameters are communicated to the application. Therefore the 
    security threats that originate from this kind of communication 
    cannot be assessed with the information given in this document. 
      
 10. Acknowledgements 
     
    We would like to thank the PSAMP group, especially Benoit Claise 
    and Stewart Bryant, for fruitful discussions and for proofreading 
    the document. We thank Sharon Goldberg for her input on security 
    issues concerning hash-based selection. 
     
 11. IANA Considerations 
        
    This document has no actions for IANA 
        
 12. Normative References  
     
    [PSAMP-PROTO] B. Claise (Ed.): Packet Sampling (PSAMP) Protocol 
                  Specifications, RFC XXXX. [Currently Internet Draft 
                  draft-ietf-psamp-protocol-07.txt, work in progress, 
                  October 2006] 
     
    [IPFIX-PROTO] B. Claise (Editor) "Specification of the IPFIX 
                  Protocol for the Exchange of IP Traffic Flow 
                  Information", RFC XXXX. [Currently Internet Draft,  
                  draft-ietf-ipfix-protocol-24.txt, November 2006] 
     
    [PSAMP-INFO] T. Dietz, F. Dressler, G. Carle, B. Claise: 
                  Information Model for Packet Sampling Exports, RFC 
                  XXXX. [Currently Internet Draft, draft-ietf-psamp-
                  info-05, October 2006] 
     
    [IPFIX-INFO] J. Meyer, J. Quittek, S. Bryant: Information Model 
                  for IP Flow Information Export, RFC XXXX [Currently 
                  Internet Draft, draft-ietf-ipfix-info-15, February 
                  2007] 
     
 13. Informative References 
     
    [PSAMP-FW]   Nick Duffield (Ed.): A Framework for Packet Selection 
                  and Reporting, RFC XXXX [currently Internet Draft 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 28] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


                  draft-ietf-psamp-framework-11, work in progress, May 
                  2007]  
     
    [PSAMP-MIB]  T. Dietz, B. Claise: Definitions of Managed Objects 
                  for Packet Sampling, RFC XXXX. [Currently Internet 
                  Draft, draft-ietf-psamp-mib-06.txt, work in progress, 
                  June 2006] 
     
    [AmCa89]    Paul D. Amer, Lillian N. Cassel: Management of Sampled 
                 Real-Time Network Measurements, 14th Conference on 
                 Local Computer Networks, October 1989, Minneapolis, 
                 pages 62-68, IEEE, 1989 
     
    [BeCK96]    M. Bellare, R. Canetti and H. Krawczyk, "Pseudorandom 
                 Functions Revisited: The Cascade Construction and its 
                 Concrete Security", Symposium on Foundations of 
                 Computer Science, 1996. 
     
    [ClPB93]    K.C. Claffy, George C. Polyzos, Hans-Werner Braun: 
                 Application of Sampling Methodologies to Network 
                 Traffic Characterization, Proceedings of ACM 
                 SIGCOMM'93, San Francisco, CA, USA, September 13 - 17, 
                 1993 
  
    [crc32]     R. Braden, D. Borman, C. Partridge: Computing the 
                 Internet Checksum, RFC 1071, Sep. 1988 (updated by 
                 RFCs 1141 and 1624) 
     
    [DuGG02]    N.G. Duffield, A. Gerber, M. Grossglauser: Trajectory 
                 Engine: A Backend for Trajectory Sampling, IEEE 
                 Network Operations and Management Symposium 2002, 
                 Florence, Italy, April 15-19, 2002. 
  
    [DuGr00]    N.G. Duffield, M. Grossglauser: Trajectory Sampling 
                 for Direct Traffic Observation, Proceedings of ACM 
                 SIGCOMM 2000, Stockholm, Sweden, August 28 - September 
                 1, 2000. 
  
    [DuGr04]    N. G. Duffield and M. Grossglauser: Trajectory 
                 Sampling with Unreliable Reporting, Proc IEEE Infocom 
                 2004, Hong Kong, March 2004, 
  
    [DuLT01]    N.G. Duffield, C. Lund, and M. Thorup: Charging from 
                 Sampled Network Usage, ACM Internet Measurement 
                 Workshop IMW 2001, San Francisco, USA, November 1-2, 
                 2001 
     
    [EsVa01]    C. Estan and G. Varghese: New Directions in Traffic 
                 Measurement and Accounting, ACM SIGCOMM Internet 
                 Measurement Workshop 2001, San Francisco (CA) Nov. 
                 2001 
     
    [GoRe07]    S. Goldberg, J. Rexford.  "Security Vulnerabilities 
                 and Solutions for Packet Sampling", IEEE Sarnoff 
                 Symposium, Princeton, NJ, May 2007. 
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 29] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    [HT52]      D.G. Horvitz and D.J. Thompson: A Generalization of 
                 Sampling without replacement from a Finite Universe. 
                 J. Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952. 
     
    [Jenk97]    B. Jenkins: Algorithm Alley, Dr. Dobb's Journal, 
                 September 1997. 
                 http://burtleburtle.net/bob/hash/doobs.html  
     
    [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic 
                 Estimation for the Largest Sources on a Network, Using 
                 Packet Sampling with Limited Storage, HP technical 
                 report, Managemenr, Mathematics and Security 
                 Department, HP Laboratories, Bristol, March 1992, 
                 http://www.hpl.hp.com/techreports/92/HPL-92-35.html 
  
    [MolF03]    M.Molina: Flow selection support in IPFIX, Internet 
                 Draft <draft-molina-flow-selection-00.txt>, work in 
                 progress, October 2003. 
     
    [Moli03]    M.Molina: A scalable and efficient methodology for 
                 flow monitoring in the internet, International 
                 Teletraffic Congress (ITC-18), Berlin, Sep. 2003 
  
    [MoND05]    M. Molina, S.Niccolini, N.G.Duffield: A Comparative 
                 Experimental Study of Hash Functions Applied to Packet 
                 Sampling. International Teletraffic Congress (ITC-19), 
                 Beijing, August 2005 
     
    [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate 
                 Requirement Levels", BCP 14, RFC 2119, March 1997 
     
    [RFC3917]    J. Quittek, T. Zseby, B. Claise, S. Zander: 
                  Requirements for IP Flow Information Export, RFC 
                  3917, October 2004 
     
    [RFC4271]    Y. Rekhter, T. Li, S. Hares, A Border Gateway 
                  Protocol 4 (BGP-4), RFC 4271, January 2006 
  
    [Zseb03]    T. Zseby: Stratification Strategies for Sampling-based 
                 Non-intrusive Measurement of One-way Delay. 
                 Proceedings of Passive and Active Measurement Workshop 
                 (PAM 20003), La Jolla, CA, USA, pp. 171-179, April 
                 2003 
 14. Authors' Addresses 
     
    Tanja Zseby 
    Fraunhofer Institute for Open Communication Systems 
    Kaiserin-Augusta-Allee 31 
    10589 Berlin 
    Germany 
    Phone: +49-30-34 63 7153 
    Email: zseby@fokus.fhg.de 
  
    Maurizio Molina     
    DANTE  


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 30] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    City House         
    126-130 Hills Road 
    Cambridge CB21PQ     
    United Kingdom 
    Phone: +44 1223 371 300 
    Email: maurizio.molina@dante.org.uk 
  
    Nick Duffield 
    AT&T Labs - Research 
    Room B-139 
    180 Park Ave 
    Florham Park NJ 07932, USA 
    Phone: +1 973-360-8726 
    Email: duffield@research.att.com 
     
    Saverio Niccolini 
    Network Laboratories, NEC Europe Ltd.  
    Kurfuerstenanlage 36  
    69115 Heidelberg  
    Germany  
    Phone: +49-6221-9051118  
    Email:  saverio.niccolini@netlab.nec.de 
      
    Fredric Raspall 
    EPSC-UPC  
    Dept. of Telematics  
    Av. del Canal Olimpic, s/n  
    Edifici C4  
    E-08860 Castelldefels, Barcelona  
    Spain  
    Email: fredi@entel.upc.es 
     
 15. Intellectual Property Statement 
     
    The IETF has been notified of intellectual property rights claimed 
    in regard to some or all of the specification contained in this 
    document. For more information consult the online list of claimed 
    rights. 
     
    The IETF takes no position regarding the validity or scope of any 
    Intellectual Property Rights or other rights that might be claimed 
    to pertain to the implementation or use of the technology described 
    in this document or the extent to which any license under such 
    rights might or might not be available; nor does it represent that 
    it has made any independent effort to identify any such rights.  
    Information on the procedures with respect to rights in RFC 
    documents can be found in BCP 78 and BCP 79.  
     
    Copies of IPR disclosures made to the IETF Secretariat and any 
    assurances of licenses to be made available, or the result of an 
    attempt made to obtain a general license or permission for the use 
    of such proprietary rights by implementers or users of this 
    specification can be obtained from the IETF on-line IPR repository 
    at http://www.ietf.org/ipr. 
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 31] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    The IETF invites any interested party to bring to its attention any 
    copyrights, patents or patent applications, or other proprietary 
    rights that may cover technology that may be required to implement 
    this standard. Please address the information to the IETF at ietf-
    ipr@ietf.org. 
     
 16. Copyright Statement 
     
    Copyright (C) The IETF Trust (2007). 
        
    This document is subject to the rights, licenses and restrictions 
    contained in BCP 78, and except as set forth therein, the authors 
    retain all their rights. 
     
 18. Disclaimer 
        
    This document and the information contained herein are provided on 
    an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 
    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 
    IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 
    WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 
    WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 
    ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 
    FOR A PARTICULAR PURPOSE. 
     
 17. Appendix: Hash Functions 
     
 17.1   IP Shift-XOR (IPSX) Hash Function 
     
    The IPSX Hash Function is tailored for acting on IP version 4 
    packets. It exploits the structure of IP packet and in particular 
    the variability expected to be exhibited within different fields of 
    the IP packet in order to furnish a hash value with little apparent 
    correlation with individual packet fields. Fields from the IPv4 and 
    TCP/UDP headers are used as input. The IPSX Hash Function uses a 
    small number of simple instructions. 
     
    Input parameters: None 
     
    Built-in parameters: None 
     
    Output: The output of the IPSX is a 16 bit number 
     
    Functioning:  
    The functioning can be divided into two parts: input selection, 
    which forms are composite input from various portions of the IP 
    packet, followed by computation of the hash on the composite. 
     
    Input Selection: 
    The raw input is drawn from the first 20 bytes of the IP packet 
    header and the first 8 bytes of the IP payload. If IP options are 
    not used, the IP header has 20 bytes, and hence the two portions 
    adjoin and comprise the first 28 bytes of the IP packet. We now use 
    the raw input as 4 32-bit subportions of these 28 bytes. We specify 
    the input by bit offsets from the start of IP header or payload. 
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 32] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    f1 = bits 32 to 63 of the IP header, comprising the IP     
         identification field, flags, and fragment offset. 
        
    f2 = bits 96 to 127 of the IP header, the source IP address. 
        
    f3 = bits 128 to 159 of the IP header, the destination IP  
         address. 
     
    f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4  
         comprises the TCP sequence number followed by the message 
         length. For a UDP packet f4 comprises the UDP checksum. 
     
    Hash Computation: 
    The hash is computed from f1, f2, f3 and f4 by a combination of XOR 
    (^), right shift (>>) and left shift (<<) operations. The 
    intermediate quantities h1, v1, v2 are 32-bit numbers. 
     
           1.    v1 = f1 ^ f2; 
           2.    v2 = f3 ^ f4;   
           3.    h1 = v1 << 8; 
           4.    h1 ^= v1 >> 4; 
           5.    h1 ^= v1 >> 12;  
           6.    h1 ^= v1 >> 16; 
           7.    h1 ^= v2 << 6; 
           8.    h1 ^= v2 << 10; 
           9.    h1 ^= v2 << 14; 
           10.   h1 ^= v2 >> 7; 
     
    The output of the hash is the least significant 16 bits of h1. 
  
     
 17.2   BOB Hash Function  
     
    The BOB Hash Function is a Hash Function designed for having each 
    bit of the input affecting every bit of the return value and using 
    both 1-bit and 2-bit deltas to achieve the so called avalanche 
    effect [Jenk97]. This function was originally built for hash table 
    lookup with fast software implementation.  
           
    Input Parameters:  
    The input parameters of such a function are:  
    - the length of the input string (key) to be hashed, in bytes. The 
    elementary input blocks of Bob hash are the single bytes, therefore 
    no padding is needed.  
    - an init value (an arbitrary 32-bit number).  
     
    Built in parameters:  
    The Bob Hash uses the following built-in parameter:        
    - the golden ratio (an arbitrary 32-bit number used in the hash  
    function computation: its purpose is to avoid mapping all zeros to 
    all zeros);  
     
    Note: the mix sub-function (see mix (a,b,c) macro in the reference 
    code in 3.2.4) has a number of parameters governing the shifts in 
    the registers. The one presented is not the only possible choice.  
     

 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 33] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


    It is an open point whether these may be considered additional  
    built-in parameters to specify at function configuration.  
     
    Output.  
    The output of the BOB function is a 32-bit number. It should be 
    specified:  
    - A 32 bit mask to apply to the output  
    - The selection range as a list of non overlapping intervals [start 
    value, end value] where value is in [0,2^32]  
           
    Functioning:  
    The hash value is obtained computing first an initialization of an 
    internal state (composed of 3 32-bit numbers, called a, b, c in the 
    reference code below), then, for each input byte of the key the 
    internal state is combined by addition and mixed using the mix sub-
    function. Finally, the internal state mixed one last time and the 
    third number of the state (c) is chosen as the return value.  
     
    typedef unsigned long int  ub4;   /* unsigned 4-byte quantities */  
    typedef unsigned      char ub1;   /* unsigned 1-byte quantities */  
     
    #define hashsize(n) ((ub4)1<<(n))  
    #define hashmask(n) (hashsize(n)-1)  
     
    /* ------------------------------------------------------ 
      mix -- mix 3 32-bit values reversibly.  
      For every delta with one or two bits set, and the deltas of all 
    three high bits or all three low bits, whether the original value 
    of a,b,c is almost all zero or is uniformly distributed,  
      * If mix() is run forward or backward, at least 32 bits in a,b,c 
    have at least 1/4 probability of changing.  
      * If mix() is run forward, every bit of c will change between 1/3 
    and 2/3 of the time.  (Well, 22/100 and 78/100 for some 2-bit 
    deltas.) mix() was built out of 36 single-cycle latency 
    instructions in a structure that could supported 2x parallelism, 
    like so:  
            a -= b;  
            a -= c; x = (c>>13);  
            b -= c; a ^= x;  
            b -= a; x = (a<<8);  
            c -= a; b ^= x;  
            c -= b; x = (b>>13);  
            ...  
    Unfortunately, superscalar Pentiums and Sparcs can't take advantage 
    of that parallelism.  They've also turned some of those single-
    cycle latency instructions into multi-cycle latency instructions  
     
    ------------------------------------------------------------*/  
     
      #define mix(a,b,c)  \  
      { \  
        a -= b; a -= c; a ^= (c>>13); \  
        b -= c; b -= a; b ^= (a<<8); \  
        c -= a; c -= b; c ^= (b>>13); \  
        a -= b; a -= c; a ^= (c>>12);  \  
        b -= c; b -= a; b ^= (a<<16); \  


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 34] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


        c -= a; c -= b; c ^= (b>>5); \  
        a -= b; a -= c; a ^= (c>>3);  \  
        b -= c; b -= a; b ^= (a<<10); \  
        c -= a; c -= b; c ^= (b>>15); \  
      }  
        
      /* -----------------------------------------------------------  
    hash() -- hash a variable-length key into a 32-bit value  
    k       : the key (the unaligned variable-length array of bytes)  
    len     : the length of the key, counting by bytes  
    initval : can be any 4-byte value  
    Returns a 32-bit value.  Every bit of the key affects every bit of 
    the return value.  Every 1-bit and 2-bit delta achieves avalanche. 
    About 6*len+35 instructions.  
        
    The best hash table sizes are powers of 2.  There is no need to do 
    mod a prime (mod is sooo slow!).  If you need less than 32 bits, 
    use a bitmask.  For example, if you need only 10 bits, do  
    h = (h & hashmask(10));  
    In which case, the hash table should have hashsize(10) elements.  
     
    If you are hashing n strings (ub1 **)k, do it like this:  
    for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);  
     
    By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may use 
    this code any way you wish, private, educational, or commercial.  
    It's free.  
        
   See http://burtleburtle.net/bob/hash/evahash.html  
    Use for hash table lookup, or anything where one collision in 2^^32 
    is acceptable.  Do NOT use for cryptographic purposes.  
     ----------------------------------------------------------- */  
        
      ub4 bob_hash(k, length, initval)  
      register ub1 *k;        /* the key */  
      register ub4  length;   /* the length of the key */  
      register ub4  initval;  /* an arbitrary value */  
      {  
         register ub4 a,b,c,len;  
        
         /* Set up the internal state */  
         len = length;  
         a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value */ 
         c = initval;         /* another arbitrary value */  
        
    /*------------------------------------ handle most of the key */  
        
         while (len >= 12)  
         {  
            a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) 
    +((ub4)k[3]<<24));  
            b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) 
    +((ub4)k[7]<<24));  
            c += (k[8] +((ub4)k[9]<<8) 
    +((ub4)k[10]<<16)+((ub4)k[11]<<24));  
            mix(a,b,c);  


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 35] 
 Internet Draft  Techniques for IP Packet Selection   May 2007 


            k += 12; len -= 12;  
         }  
        
         /*---------------------------- handle the last 11 bytes */  
         c += length;  
         switch(len)       /* all the case statements fall through*/  
         {  
         case 11: c+=((ub4)k[10]<<24);  
         case 10: c+=((ub4)k[9]<<16);  
         case 9 : c+=((ub4)k[8]<<8);  
            /* the first byte of c is reserved for the length */  
         case 8 : b+=((ub4)k[7]<<24);  
         case 7 : b+=((ub4)k[6]<<16);  
         case 6 : b+=((ub4)k[5]<<8);  
         case 5 : b+=k[4];  
         case 4 : a+=((ub4)k[3]<<24);  
         case 3 : a+=((ub4)k[2]<<16);  
         case 2 : a+=((ub4)k[1]<<8);  
         case 1 : a+=k[0];  
           /* case 0: nothing left to add */  
         }  
         mix(a,b,c);  
         /*-------------------------------- report the result */  
         return c;  
      } 


 Zseby, Molina, Duffield, Niccolini, Raspall              [Page 36]