Internet Draft                                                          
 Document: <draft-ietf-psamp-sample-tech-03.txt>                T. Zseby 
 Expires: April 2003                                    Fraunhofer FOKUS 
                                                               M. Molina 
                                                         NEC Europe Ltd. 
                                                              F. Raspall 
                                                         NEC Europe Ltd. 
                                                           Nick Duffield 
                                                    AT&T Labs - Research 
                                                                         
                                                             October2003 
  
  
    Sampling and Filtering Techniques for IP Packet Selection 
  
  
 Status of this Memo 
  
    This document is an Internet-Draft and is in full conformance 
    with all provisions of Section 10 of RFC2026.  
     
    Internet-Drafts are working documents of the Internet 
    Engineering Task Force (IETF), its areas, and its working 
    groups. Note that other groups may also distribute working 
    documents as Internet-Drafts. Internet-Drafts are draft 
    documents valid for a maximum of six months and may be updated, 
    replaced, or obsoleted by other documents at any time. It is 
    inappropriate to use Internet-Drafts as reference material or to 
    cite them other than as "work in progress."  
     
    The list of current Internet-Drafts can be accessed at 
    http://www.ietf.org/ietf/1id-abstracts.txt  
     
    The list of Internet-Draft Shadow Directories can be accessed at 
    http://www.ietf.org/shadow.html. 
     
 Abstract 
     
    This document describes sampling and filtering techniques for IP 
    packet selection. In two information models (one for sampling, 
    one for filtering) it defines what parameters are needed to 
    describe the most common selection schemes and shows how 
    techniques can be combined to build more elaborate packet 
    selectors. The information models are used for configuring the 
    selection technique in measurement processes and for reporting 
    the technique in use to a collector. 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 1] 
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
 Table of Contents 
  
    1.   Introduction.................................................2 
    2.   Terminology..................................................3 
    3.   Scope and Deployment of Packet Selection Techniques..........8 
    3.1  Sampling.....................................................9 
    3.1.1 Systematic Sampling........................................10 
    3.1.2 Random Sampling............................................10 
    3.1.2.1 n-out-of-N sampling......................................11 
    3.1.2.2 Probabilistic sampling...................................11 
    3.1.2.2.1 Uniform Probabilistic Sampling.........................11 
    3.1.2.2.2 Non-Uniform Probabilistic Sampling.....................11 
    3.1.2.2.3 Non-Uniform flow State dependent sampling..............12 
    3.1.3 Overview of sampling schemes regarding content dependence..12 
    4.   Filtering...................................................13 
    4.1  Mask/Match filtering........................................14 
    4.2  Hashing filtering...........................................14 
    4.2.1 Examples of Hashing Filtering applications.................15 
    4.2.1.1 Random sampling emulation................................15 
    4.2.1.2 Consistent packet selection and its applications.........15 
    4.2.2 Guarding Against Pitfalls and Vulnerabilities..............16 
    4.3  Router State filtering......................................17 
    5.   Input Parameters and Information Models.....................17 
    5.1  Information Model for Sampling Techniques...................18 
    5.2  Information Model for Filtering Techniques..................20 
    6.   Composite Techniques........................................23 
    6.1  Cascaded filtering->sampling or sampling->filtering.........23 
    6.2  Stratified Sampling.........................................23 
    7.   Security Considerations.....................................24 
    8.   References..................................................25 
    9.   Author's Addresses..........................................26 
    10.  Intellectual Property Statement.............................27 
    11.  Full Copyright Statement....................................27 
  
 1. Introduction 
     
    Increasing data rates and growing measurement demands increase 
    the requirements for data collection resources. For measurement 
    scenarios in backbone networks it is often required to measure 
    whole traffic aggregates instead of single flows. Furthermore, 
    some measurement methods require the capturing of packet headers 
    or even parts of the payload. All this can lead to an 
    overwhelming amount of measurement data, resulting in high 
    demands regarding resources for metering, storage, transport and 
    post processing.  
  
    In some cases specialized hardware helps to fulfill these 
    demands but on the other hand increases the costs for providing 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 2] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    the measurement. Since measurements are mainly a supporting 
    functionality for the service provisioning, measurement costs 
    usually should be limited to a small fraction of the costs of 
    the network service provisioning itself. A reduction of the data 
    that is considered and reported by a measurement process is 
    crucial to prevent the depletion of the available (i.e. the 
    affordable) resources.  Such a reduction can be achieved by a 
    reasonable deployment of packet selection techniques, that 
    sample a subset of the packets while still allowing an 
    appropriate accuracy, or filter out all packets that are not of 
    interest for the measurement at all. Packet selection helps to 
    prevent an exhaustion of resources and to limit the measurement 
    costs. Examples for applications that benefit from packet 
    selection are given in [DuGG03]. 
  
 2. Terminology 
     
    The terms used throughout this document are consistent with 
    those defined in [DuGG03]. 
  
 2.1 Terminology for Observation and Selection 
     
    * Observation Point: a location in the network where a packet 
       stream is observed. Examples include: 
        
         - a line to which a probe is attached; 
             
         - a shared medium, such as an Ethernet-based LAN; 
             
         - a single port of a router, or set of interfaces (physical 
            or logical) of a router; 
             
         - an embedded measurement subsystem within an interface.  
             
    * Measurement Process: the combination of a selection process 
       followed by a reporting process. 
        
    * Observed Packet Stream: the set of all packets observed at the 
       observation point. 
        
    * Packet Stream: either the observed packet stream, or a subset 
       of it that forms the output of a selection process. 
        
       Note that packets selected from a stream, e.g. by sampling, do 
       not necessarily possess a property by which they can be 
       distinguished from packets that have not been selected. For 
       this reason the term ‘‘stream’’ is favored over ‘‘flow’’, which is 
       defined as set of packets with common properties [QuZC02]. 
        

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 3] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    * Selection Process: takes a packet stream as its input and 
       selects a subset of that stream as its output.  
        
    * Selector: defines the action of a selection process on a 
       single packet of its input. A selected packet becomes an 
       element of the output packet stream of the selection process. 
        
       The selector can make use of the following information in 
       determining whether a packet is selected: 
        
         - the packet’s content; 
             
         - information derived from the packet's treatment at the 
            observation point; 
             
         - any selection state that may be maintained by the 
            selection process. 
             
    * Packet Content: the union of the packet header (which includes 
       link layer, network layer and other encapsulation headers) and 
       the packet payload. 
        
    * Selection State: a selection process may maintain state 
       information for use by the selection process and/or the 
       reporting process. At a given time, the selection state may 
       depend on packets observed at and before that time, and other 
       variables. Examples include: 
         
         - sequence numbers of packets at the input of selectors; 
             
         - a timestamp of arrival of the packet at the observation 
            points; 
             
         - iterators for pseudorandom number generators; 
      
         - hash values calculated during selection; 
   
         - indicators of whether the packet was selected by a given 
            selector. 
             
       Selectors may change the selection state. 
     
    * Composite Selection Processes: selection processes may be 
       composed, in which case the output stream issuing from one of 
       the selection processes in the composition forms the input 
       stream for the succeeding selection process.  
        
       The composition of selection processes is also a selection 
       process; the elements of the composition can be viewed as 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 4] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
       selection subprocesses of the composite. A selection process 
       that is not a subprocess of another selection process takes 
       the observed packet stream as its input.  
     
    * Primitive Selection Process: a selection process that is not a 
       composition of other selection processes. 
     
    * Composite Selector: the selector of a composite selection 
       process. 
     
    * Primitive Selector: the selector of a primitive selection 
       process. 
     
 2.2 Terminology for Reporting, Export and Collection 
     
    * Reporting Process: creates a report stream on packets selected 
       by a selection process, in preparation for export. The input 
       to the reporting process comprises that information available 
       to the selection process per selected packet, specifically: 
        
         - the selected packet’s content; 
             
         - information derived from the selected packet's treatment 
            at the observation point; 
             
         - any selection state maintained by the inputting selection 
            process, reflecting any modifications to the selection 
            state made during selection of the packet. 
     
    * Report Stream: the output of a reporting process is a report 
       stream, comprising two distinguished types of information: 
       packet reports, and report interpretation. 
     
    * Packet Reports: a configurable subset of the per packet input 
       to the reporting process.  
     
    * Report Interpretation: subsidiary information relating to one 
       or more packets, that is used for interpretation of their 
       packet reports. Examples include configuration parameters of 
       the measurement process and of the export process. 
     
    * Export Process: sends the output of one or more reporting 
       processes to one or more collectors. 
     
    * Collector: a collector receives a report stream exported by 
       one or more export processes. In some cases, the host of the 
       measurement and/or export processes may also serve as the 
       collector. 
     

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 5] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    * Export packets: one or packet reports, and perhaps report 
       interpretation, are bundled by the export process into a 
       export packet for export to a collector. 
   
    Various possibilities for the high level architecture of these 
    elements is as follows. 
     
        MP = Measurement Process, EP = Export Process 
     
       +---------------------+                 +------------------+ 
       |Observation Point(s) |                 | Collector(1)     | 
       |MP(s)--->EP----------+---------------->|                  |     
       |MP(s)--->EP----------+-------+-------->|                  | 
       +---------------------+       |         +------------------+ 
                                     |     
       +---------------------+       |         +------------------+ 
       |Observation Point(s) |       +-------->| Collector(2)     | 
       |MP(s)--->EP----------+---------------->|                  | 
       +---------------------+                 +------------------+ 
                                        
       +---------------------+          
       |Observation Point(s) |          
       |MP(s)--->EP---+      |          
       |              |      |          
       |Collector(3)<-+      | 
       +---------------------+   
     
  
 2.3 Packet Selection Terminology. 
     
    * Filtering: a filter is a selection operation that selects a 
       packet deterministically based on the packet content, its 
       treatment, and functions of these occurring in the selection 
       state. Examples include mask/match filtering, and hash-based 
       selection. 
     
    * Sampling: a selection operation that is not a filter is called 
       a sampling operation. This reflects the intuitive notion that 
       if the selection of a packet cannot be exactly determined from 
       its content alone, there must be some type of sampling taking 
       place.  
     
    * Content-independent Sampling: a sampling operation that does 
       not use packet content (or quantities derived from it) as the 
       basis for selection is called a content-independent sampling 
       operation. Examples include systematic sampling, and uniform 
       pseudorandom sampling driven by a pseudorandom number whose 
       generation is independent of packet content. Note that in 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 6] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
       independent sampling it is not necessary to access the packet 
       content in order to make the selection decision. 
     
    * Content-dependent Sampling: a sampling operation where 
       selection is dependent on packet content is called a content-
       dependent sampling operation. Examples include pseudorandom 
       selection according to a probability that depends on the 
       contents of a packet field; note that this is not a filter. 
     
    * Approximate Selection: selection operations in any of the 
       above four categories may be approximated by operations in the 
       same or another category for the purposes of implementation. 
       For example, uniform pseudorandom sampling may be approximated 
       by hash-based selection, using a suitable hash function and 
       hash domain. In this case, the closeness of the approximation 
       depends on the choice of hash function and hash domain. 
     
    * Hash-based Selection: a filter specified by a hash domain, a 
       hash function, and hash range and a hash selection range. 
     
    * Hash Domain: a subset of the packet content and the packet 
       treatment, viewed as an N-bit string for some positive integer 
       N. 
     
    * Hash Range: a set of M-bit strings for some positive integer 
       M. 
     
    * Hash Function: a deterministic map from the hash domain into 
       the hash range. 
     
    * Hash Selection Range: a subset of the hash range. The packet 
       is selected if the action of the hash function on the hash 
       domain for the packet yields a result in the hash selection 
       range. 
     
    * Population size: the number of packets in a subset of a packet 
       stream. 
     
    * Sample size: the number of packets selected from a subset of a 
       packet stream by a selection operation. 
  
    * Attained Selection Frequency: the actual proportion of packets 
       selected by a selection operation from a given subset of a 
       packet stream. Calculated as the ratio of the sample size to 
       the population size for the subset. When the selection 
       operation is a sampling operation, or approximates a sampling 
       operation, we can speak of the Attained Sampling Frequency. 
              

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 7] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    * Target Sampling Frequency: a configurable sampling frequency 
       in a sampling operation. Due to the inherent statistical 
       variability of sampling, the target and attained sampling 
       rates will not in general be equal, although they may be close 
       in some circumstances, e.g., when the population size is 
       large. 
  
  
 3. Scope and Deployment of Packet Selection Techniques 
  
    Packet selection techniques generate a subset of packets from an 
    Observed Packet Stream at an observation point. We distinguish 
    between sampling and filtering. 
     
    Sampling is targeted at the selection of a representative subset 
    of packets. The subset is used to infer knowledge about the 
    whole set of observed packets without processing them all. . The 
    selection can depend on packet position, and/or on packet 
    content, and/or on (pseudo) random decisions.  
  
    Filtering selects a subset with common properties. This is used 
    if only a subset of packets is of interest. The properties can 
    be directly derived from the packet content, or depend on the 
    treatment given by the router to the packet. Filtering is a 
    deterministic operation. It depends on packet content or router 
    treatment. It never depends on packet position or on (pseudo) 
    random decisions. 
     
    Note that a common technique to select packets is to compute a 
    Hash Function on some bits of the packet header and/or content 
    and to select it if the Hash Value falls in  the Hash Selection 
    Range. Since hashing is a deterministic operation on the packet 
    content, it is a filtering technique according to our 
    categorization.  
    Nevertheless, hash functions are sometimes used to emulate 
    random sampling. Depending on the chosen input bits, the Hash 
    Function and the Hash Selection Range, this technique can be 
    used to emulate the random selection of packets with a given 
    probability p. 
    It is also a powerful technique to consistently select the same 
    packet subset at multiple observation points [DuGr00] 
  
    The introduced classification is mainly useful for the 
    definition of an information model describing Primitive 
    Selectors . More complex selection techniques may then be 
    described through the composition of cascaded sampling and 
    filtering operations. 
    For example, a packet selection that weights the selection 
    probability on the basis of the packet length can be described 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 8] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    as a set of filter/sampling cascades. However, this descriptive 
    approach is not intended to be rigid: if a common and 
    consolidated selection practice turns out to be too complex to 
    be described as a composition of the mentioned building blocks, 
    an ad hoc description can be specified instead. 
     
    Packet selectors can be  part of an IPFIX metering process which 
    also can use the IPFIX exporting process. This is expressed as 
    association to one or more IPFIX processes. 
  
 3.1 Sampling 
  
    The deployment of sampling techniques aims at the provisioning 
    of information about a specific characteristic of the parent 
    population at a lower cost than a full census would demand. In 
    order to plan a suitable sampling strategy it is therefore 
    crucial to determine the needed type of information and the 
    desired degree of accuracy in advance. 
     
    First of all it is important to know the type of metric that 
    should be estimated. The metric of interest can range from 
    simple packet counts [JePP92] up to the estimation of whole 
    distributions of flow characteristics (e.g. packet 
    sizes)[ClPB93].  
  
    Secondly, the required accuracy of the information and with 
    this, the confidence that is aimed at, should be known in 
    advance. For instance for usage-based accounting the required 
    confidence for the estimation of packet counters can depend on 
    the monetary value that corresponds to the transfer of one 
    packet. That means that a higher confidence could be required 
    for expensive packet flows (e.g. premium IP service) than for 
    cheaper flows (e.g. best effort). The accuracy requirements for 
    validating a previously agreed quality can also vary extremely 
    with the customer demands. These requirements are usually 
    determined by the service level agreement (SLA). 
  
    Sampling is considered as part of the metering process. A 
    metering process consists of multiple functions (capturing, time 
    stamping, etc.). Sampling can be applied at different functions 
    of the metering process. In the following we consider a measured 
    IP packet with its observation point and timestamp as basis 
    elements of the parent population. 
     
    The sampling method and the parameters in use must be clearly 
    communicated to all applications that use the measurement data. 
    Only with this knowledge a correct interpretation of the 
    measurement results can be ensured.  
     

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 9] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    Sampling methods can be characterized by the sampling algorithm, 
    the trigger type used for starting a sampling interval and the 
    length of the sampling interval. These parameters are described 
    here in detail. The sampling algorithm describes the basic 
    process for selection of samples. In accordance to [AmCa89] and 
    [ClPB93] we define the following basic sampling processes: 
     
 3.1.1   Systematic Sampling 
  
    Systematic sampling describes the process of selecting the start 
    points and the duration of the selection intervals according to 
    a deterministic function. This can be for instance the periodic 
    selection of every k-th element of a trace but also the 
    selection of all packets that arrive at pre-defined points in 
    time. Even if the selection process does not follow a periodic 
    function (e.g. if the time between the sampling intervals varies 
    over time) we consider this as systematic sampling as long as 
    the selection is deterministic.  
    The use of systematic sampling always involves the risk of 
    biasing the results. If the systematics in the sampling process 
    resemble systematics in the observed stochastic process 
    (occurrence of the characteristic of interest in the network), 
    there is a high probability that the estimation will be biased. 
    Systematics in the observed process might not be known in 
    advance.  
    Here only equally spaced schemes are considered, where triggers 
    for sampling are periodic, either in time or in packet count. 
    All packets occurring in a selection interval (either in time or 
    packet count) beyond the trigger are selected.  
  
    Systematic count-based 
    In systematic count-based sampling the start and stop triggers 
    for the sampling interval are defined in accordance to the 
    special packet position (packet count). 
  
    Systematic time-based 
    In systematic count-based sampling the start and stop triggers 
    for the sampling interval are defined in accordance to the 
    temporal packet position (arrival time). 
  
    Both schemes are content-independent selection schemes. 
     
 3.1.2   Random Sampling 
  
    Random sampling selects the starting points of the sampling 
    intervals in accordance to a random process. The selection of 
    elements are independent experiments. With this, unbiased 
    estimations can be achieved. In contrast to systematic sampling, 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 10] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    random sampling requires the generation of random numbers. One 
    can differentiate two methods of random sampling: 
     
 3.1.2.1 n-out-of-N sampling 
  
    In n-out-of-N sampling n elements are selected out of the parent 
    population that consists of N elements. One example would be to 
    generate n different random numbers in the range [1,N] and 
    select all packets which have a packet position equal to one of 
    the random numbers. For this kind of sampling the sample size n 
    is fixed.  
     
 3.1.2.2 Probabilistic sampling 
     
    In probabilistic sampling the decision whether an element is 
    selected or not is made in accordance to a pre-defined selection 
    probability. An example would be to flip a coin for each packet 
    and select all packets for which the coin showed the head. For 
    this kind of sampling the sample size can vary for different 
    trials. The selection probability does not necessarily has to be 
    the same for each packet. Therefore we distinguish between 
    uniform probabilistic sampling (with the same selection 
    probability for all packets) and non-uniform  probabilistic 
    sampling (where the selection probability can vary for different 
    packets). 
     

 3.1.2.2.1      Uniform Probabilistic Sampling 
     
    For Uniform Probabilistic  Sampling packets are selected 
    independently with some uniform probability p. This sampling can 
    be count-driven, and is sometimes referred to as geometric 
    random sampling, since the difference in count between 
    successive selected packets are independent random variables 
    with a geometric distribution of mean 1/p. A time-driven analog, 
    exponential random  sampling, has the time between triggers 
    exponentially distributed. 
    Both geometric and exponential random sampling are examples of 
    what is known as additive random sampling, defined as sampling 
    where the intervals or counts between successive samples are 
    independent identically distributed random variable. 

 3.1.2.2.2      Non-Uniform Probabilistic Sampling 
     
    This is a variant of Uniform Probabilistic Sampling in which the 
    sampling probabilities can depend on the selection process 
    input. This can be used to weight sampling probabilities in 
    order e.g. to boost the chance of sampling packets that are rare 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 11] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    but are deemed important. Unbiased estimators for quantitative 
    statistics are recovered by renormalization of sample values; 
    see [HT52]. 
     

 3.1.2.2.3      Non-Uniform flow State dependent sampling  
  
    Another type of sampling that can be classified as probabilistic 
    Non-Uniform  is closely related to the flow concept as defined 
    in [QuZC02], and it is only used jointly with a flow monitoring 
    function (IPFIX monitoring function). Packets are selected,  
    dependent on a selection state. The point, here, is that the 
    selection state is determined also by the state of the flow the 
    packet belongs to and/or by the state of the other flows 
    currently being monitored by the associated flow monitoring 
    function. An example for such an algorithm is the ’’sample and 
    hold’’ method described in [EsVa01]: 
     
    - if a packet accounts for a flow record that already exists in 
       the IPFIX flow recording process, select the packet (i.e. 
       simply update the flow record) 
    - if a packet doesn't account to any existing flow record, 
       select it with probability p. If selected, create a new flow 
       record for it. 
     
    A further algorithm that fits into the category of non-uniform 
    flow state dependent sampling is described in [Molina03]. 
     
    This type of sampling is content dependent because the 
    identification of the flow the packet belongs to requires 
    analyzing part of the packet content. If the packet is selected, 
    then it is passed as an input to the IPFIX monitoring function 
    (this is called ’’Local Export’’ in [DuGG03] 
    Selecting the packet depending on the state of a flow cache is 
    useful when memory resources of the flow monitoring function are 
    scarce (i.e. there’s no room to keep all the flows that have 
    been scheduled for monitoring). See [MolFl03] for a more 
    detailed description of the motivations for this type of 
    sampling and the impact on the IPFIX metering. 
     
     
 3.1.3   Overview of sampling schemes regarding content dependence  
  
    n-out-of-N sampling and uniform probabilistic sampling are 
    content-
           -independent selection schemes. For non-uniform 
    probabilistic sampling  and non-uniform flow state dependent 
    sampling the sampling probability can be based on packet 
    content. The following table shows what schemes are content-
    independent and which can depend on the content. 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 12] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
           Sampling Scheme          |  content-     |  content- 
                                    |  independent  |  dependent          
      ------------------------------+---------------+--------------- 
         systematic                 |       X       |   
         count-based                |               |   
      ------------------------------+---------------+--------------- 
         systematic                 |       X       |   
         time-based                 |               |   
      ------------------------------+---------------+--------------- 
         random                     |       X       |   
         n-out-of-N                 |               |   
      ------------------------------+---------------+--------------- 
         random                     |       X       |   
         uniform probabilistic      |               |   
      ------------------------------+---------------+--------------- 
         random                     |               |       X          
         non-uniform probabilistic  |               |   
      ------------------------------+---------------+--------------- 
         random                     |               |       X        
         non-uniform flow-state     |               |   
      ------------------------------+---------------+--------------- 
     
     
 4. Filtering  
     
    Filtering is the selection of packets based on the packet 
    content, the treatment of the packet at the observation point, 
    or deterministic functions of these occurring in the selection 
    state. The packet is selected if these quantities fall into a 
    specified range. 
    Packet filtering is done for a wide variety of purposes e.g. for 
    security, SLA enforcing, accounting. Depending on the type of 
    filtering, it can be applied in different parts of the metering 
    process. The role of filtering, as the word itself suggest, is 
    to separate all the packets having a certain property from those 
    not having it. A distinguishing characteristic from sampling is 
    that the property never depends on the packet position in time 
    or in the space, or on a random process. 
    We identify and describe in the following three filtering 
    techniques. The first two (Mask/Match filtering and Hashing 
    filtering) are stateless, and therefore can make their decision 
    based on the analysis of portion of the packet only. The other 
    (router state filtering) requires to access state information 
    after the analysis of part of the packet and is therefore more 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 13] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    complex: its usage makes sense only in particular circumstances, 
    as described below. 
     
 4.1 Mask/Match filtering 
  
    This type of filtering selects a packet operating as follows: 
    first a combination of packet’s bit positions is selected taking 
    the logical AND of portion of the packet’s bits and a mask, then 
    it’s checked if the result, seen as an hexadecimal number, falls 
    within a selection range. In the simplest case the selection 
    range can be a single value (e.g. filter selects only the 
    packets with a certain specific header field, e.g. a specific 
    source IP address) but more in general the selection range can 
    be a sequence of non-overlapping intervals if high level 
    interfaces are used to specify mask and matches. The selected 
    bits of the packet aren’t necessarily only those of the IP 
    header, but in case both bits of the IP header and the payload 
    are selected, the masks and the selection intervals MUST be 
    specified separately for the header and the payload. 
    An implementation is not constrained to apply exactly all the 
    steps or in this sequence, provided that the result is 
    equivalent to a logical function doing it. 
    Examples of filters of this class are filters that select 
    packets based on the matching of some of the IP header fields 
    with a (possibly masked) specific value, filters that select 
    packets having some IP header fields values falling within a 
    range, filters that do the same as above on some of the 
    transport header fields (that are thus considered as part of the 
    payload), or a combination of all of the above mentioned 
    possibilities. 
     
  
 4.2 Hashing filtering 
     
    A Hash Function h maps the packet content c, or some portion of 
    it, onto a Hash Range R. The packet is selected if h(c) is an 
    element of S, which is a subset of R called the Hash Selection 
    Range. Thus hash-based sampling is indeed a particular case of 
    filtering: the object is selected if c is in inv(h(S)). But for 
    desirable Hash Functions the inverse image inv(h(S)) will be 
    extremely complex, and hence h would not be expressible as, say, 
    a match/mask filter or a simple combination of these. 
    Hash-based sampling has mainly two types of usage: it offers a 
    way to emulate random sampling by using packet content to 
    generate pseudorandom variates and a way to consistently select 
    subsets of packets that share a common property. In the 
    following subsections we give more details about them. 
    However, both usages require that the Hash Functions has two 
    statistical properties. First, the hash function h must have 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 14] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    good mixing properties, in the sense that small changes in the 
    input (e.g. the flipping of a single bit) cause large changes in 
    the output (many bits change). Then any local clump of values of 
    c is spread widely over R by h, and so the distribution of h(c) 
    is fairly uniform even if the distribution of c is not. Then the 
    sampling rate is #S/#R, which can be tuned by choice of S. If S 
    and R are sets contiguous integers, h(c), suitably shifted and 
    normalized, can be interpreted as a pseudorandom variate. The 
    second desirable property depends more closely on the statistics 
    of the content c. In applications, the content c comprises a 
    number of distinct fields, c1 ... cm, e.g. source and 
    destination IP Address, IP identification, and TCP/UDP port 
    numbers (if present) for a packet. With a hash function 
    satisfying the first properties above, selection decisions will 
    appear uncorrelated with the contents of any individual field, 
    if the complementary fields are (i) sufficiently variable 
    themselves, and (ii) sufficiently uncorrelated with cj. 
     
 4.2.1   Examples of Hashing Filtering applications 
     
 4.2.1.1 Random sampling emulation 
     
    Although pseudorandom number generators with well understood 
    properties have been developed, they may not be the method of 
    choice in setting where computational resources are scarce. A 
    convenient alternative is to use hash functions of packet 
    content as a source of randomness. The hash (suitably 
    renormalized) is a pseudorandom variate in the interval [0,1]. 
    Other schemes may use packet fields in iterators for 
    pseudorandom numbers. 
    The point here, is that the statistical properties of the 
    idealized packet selection law (such as independence of sampling 
    decisions for different packets, or independence on packet 
    content) may not be exactly shared by an implementation, but 
    only approximately so. 
    Although the selection decisions for non-uniform probabilistic 
    sampling (see Section 3.1.2.2.2 above) also depend on the packet 
    content, this form of sampling is distinguished from the use of 
    packet content to generate variates. In the former case, the 
    content only determines the selection probabilities: selection 
    could then proceed e.g by use of a variates obtained by an 
    independent pseudorandom number generator. In the latter case, 
    the content determines the pseudorandom variates rather than the 
    probabilities. 
     
     
 4.2.1.2 Consistent packet selection and its applications 
     

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 15] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    In Consistent packet selection, all routers in a network hash 
    parts of the packet content using identical hash function and 
    selection range. The domain of the hash is restricted to those 
    parts of the packet that are invariant from hop to hop. Fields 
    such as Time-to-Live, which is decremented per hop, and header 
    CRC, which is recalculated per hop, are thus excluded from the 
    hash domain. Thus, a given packet is selected at all either all 
    points on its path through the network, or at none. The domain 
    of the hash function needs to be wider than just a flow key, if 
    packets are to be selected quasi-randomly within flows (and e.g. 
    include portions of the payload), see [DuGr00]. If a report on 
    each selected packet is exported to a collector, the latter can 
    reconstruct trajectories of the selected packets, provided it 
    can match different reports on the same packet, and distinguish 
    these from reports on different packets. For this purpose, 
    reports may also contain a second distinct hash (identification 
    hash) of the selected packets. 
     
    Applications of consistent packet selection  include (i) 
    estimation of the network path matrix, i.e., the traffic 
    intensities according to network path, broken down by flow; (ii) 
    detection of routing loops, as indicated by self-intersecting 
    trajectories; (iii) passive performance measurement: prematurely 
    terminating trajectories indicate packet loss, and packet One 
    Way Delay can be determined if reports include (synchronized) 
    timestamps; (iv) network attack tracing, of the actual paths 
    taken by attack packets with spoofed source addresses. 
     
     
 4.2.2   Guarding Against Pitfalls and Vulnerabilities 
     
    A concern is whether some large set of related packets could be 
    sampled at a rate that significantly differs from the expected 
    sampling rate, either (i) through unanticipated behavior in the 
    hash function, or (ii) because the packets had been deliberately 
    crafted to have this property. 
    The first point underlines the importance of using a hash 
    function with good mixing properties. Examples of such are CRC32 
    and hash functions based on modular arithmetic, see 6.4 in 
    [Knuth98]. The statistical properties of candidate hash 
    functions need to be evaluated, preferably on packet before 
    adoption for hash-based sampling 
    Hash sampling could be overloaded (or evaded) by an attacker if 
    the hash function and the selection rate are both known. A 
    service provider could build a first defense keeping the Hash 
    Selection Range S private. Then, an attacker could not determine 
    whether a crafted packet is selected, but he would still know 
    that a crafted a set of packets all with the same hash is either 
    all selected or all not selected. Moreover, when applications 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 16] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    (like multi domain trajectory sampling, or One Way Delay 
    estimation across multiple domains) may require multiple 
    administrative entities to agree on a common hash function and 
    selection range, mutual trust between the entities cannot be 
    avoided and just keeping S secret may not be feasible. A 
    stronger defense is to employ a parametrizable hash function and 
    keep the parameter private: in this case, the set of hash values 
    of the packets could not be determined. Examples of parameters 
    are the initial vector in CRC32, and moduli in hashes based on 
    modular arithmetic. 
     
  
 4.3 Router State filtering 
     
    This class of filters select a packet on the basis of the 
    following conditions), possibly combined with the AND, OR or NOT 
    operators. 
  
       - Ingress interface at which packet arrives equals a 
          specified value 
       - Egress interface to which packet is routed to equals a 
          specified value 
       - Packet violated Access Control List (ACL) on the router 
       - Failed Reverse Path Forwarding (RPF) 
       - Failed Resource Reservation (RSVP) 
       - No route found for the packet 
       - Origin AS equals a specified value or lies within a given  
          range 
       - Destination AS equals a specified value or lies within a 
          given range 
        
    Router architectural considerations may preclude some 
    information concerning the packet treatment, e.g. routing state, 
    being available at line rate for selection of packets. However, 
    if selection not based on routing state has reduced down from 
    line rate, subselection based on routing state may be feasible. 
  
 5. Input Parameters and Information Models 
  
    This section gives an overview of different alternative 
    selection schemes and their required parameters. In order to be 
    compliant with PSAMP it is sufficient to implement one of the 
    proposed schemes. 
     
    The decision whether to select a packet or not is based on a 
    function which is performed when the packet arrives at the 
    selection process. Packet selection schemes differ in the input 
    parameters for the selection process and the functions they 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 17] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    require to do the packet selection. The following table gives an 
    overview. 
  
         Scheme       |   input parameters     |     functions  
       ---------------+------------------------+------------------- 
        systematic    |    packet position     |  packet counter  
        count-based   |    sampling pattern    |  
       ---------------+------------------------+------------------- 
        systematic    |      arrival time      |  clock or timer 
        time-based    |     sampling pattern   | 
       ---------------+------------------------+------------------- 
          random      |  packet position       |  packet counter, 
        n-out-of-N    |  sampling pattern      |  random numbers 
                      | (random number list)   | 
       ---------------+------------------------+------------------- 
        uniform       |        sampling        |  random function 
        probabilistic |      probability       |    
       ---------------+------------------------+------------------- 
        non-uniform   |e.g. packet position    | selection function, 
        probabilistic |or packet content(parts)|  probability calc. 
       ---------------+------------------------+------------------- 
        non-uniform   |e.g. flow state         | selection function, 
        flow-state    |or packet content(parts)|  probability calc. 
       ---------------+------------------------+------------------- 
        mask/match    | packet content(parts)  |  filter function 
       ---------------+------------------------+------------------- 
        hash-based    |  packet content(parts) |  hash function 
       ---------------+------------------------+------------------- 
        router state  |    router state        |   router state 
                      |                        |   discovery 
       ---------------+------------------------+------------------- 
  
     
 5.1 Information Model for Sampling Techniques 
     
    In this section we define the information models for most common 
    sampling techniques. Here the selection function is pre-defined 
    and given by the selector ID.  
     
    Sampler Description: 
         SELECTOR_ID 
         SELECTOR_TYPE 
         SELECTOR_PARAMETERS 
         OPERATING_TIME 
         ASSOCIATIONS 
  
    Where: 
     
    SELECTOR_ID: 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 18] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    Unique ID for the packet sampler. The ID can be calculated under 
    consideration of the ASSOCIATIONS and a local ID. 
  
     
    SELECTOR_TYPE 
    Description: For sampling processes the SELECTOR TYPE defines 
    what sampling algorithm is used. 
    Values: Systematic count-based | Systematic time Based | Random 
    n-out-of-N | uniform probabilistic | non-uniform probabilistic | 
    non-uniform flow-state 
     
     
    SELECTOR_PARAMETERS 
    Description: For sampling processes the SELECTOR PARAMETERS 
    define the input parameters for the process. Interval length in 
    systematic sampling means, that all packets that arrive in this 
    interval are selected. The spacing parameter defines the spacing 
    in time or number of packets between the end of one sampling 
    interval and the start of the next succeeding interval. 
  
    Case n out of N: 
       - Population size N, Sample size n 
     
    Case Systematic Time Based: 
       - Interval length (in usec), Spacing (in usec) 
     
    Case Systematic Count Based: 
       - Interval length(in packets), Spacing (in packets) 
     
    Case uniform Probabilistic (with equal probability per packet): 
       - Sampling probability p 
        
    Case non-uniform probabilistic: 
       - Calculation function for sampling probability p 
     
    Case flow state: 
       - information reported for flow state can be found in 
          [MolFl03] 
        
    OPERATING_TIME 
    Description: The OPERATING_TIME parameter describes the 
    start/stop time of sampling process. List elements must not 
    overlap. The start time of the first element can be omitted, 
    meaning ’’from now’’. The end time of the last element can be 
    omitted, meaning ’’until sampler is removed’’. 
    Values: List of (Start time, End time) 
     
    ASSOCIATIONS 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 19] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    Description: The ASSOCIATIONS field describes the observation 
    point and (possibly) the IPFIX processes to which the packet 
    selector can be associated.  
    The STREAM ID denotes the origin of the data stream that is 
    input to the selection function. It can be the observation point 
    directly or the ID of another selector. With this it is possible 
    to define combined schemes. If the STREAM ID contains IDs from 
    other selectors, one can derive the original observation point 
    from the selector definitions of these specified selectors. 
     
    Values: <STREAM ID, Metering process ID, Exporting process ID> 
    With STREAM ID: Observation point ID AND List of SELECTOR_IDs 
     
        
 5.2 Information Model for Filtering Techniques 
     
    In this section we define the information models for most common 
    filtering techniques. The information model structure closely 
    parallels the one presented for the sampling techniques. 
     
    Filter Description: 
         SELECTOR_ID 
         SELECTOR_TYPE 
         SELECTOR_PARAMETERS 
         OPERATING_TIME 
         ASSOCIATIONS 
  
    Where: 
     
    SELECTOR_ID: 
    Unique ID for the packet filter. The ID can be calculated under 
    consideration of the ASSOCIATIONS and a local ID. 
     
    SELECTOR_TYPE 
    Description: For filtering processes the SELECTOR TYPE defines 
    what filtering type is used. 
    Values: Matching | Hashing | Router_state 
     
    SELECTOR_PARAMETERS 
    Description: For filtering processes the SELECTOR PARAMETERS 
    define formally the common property of the packet being 
    filtered. For the filters of type Matching and Hashing the 
    definitions have a lot of points in common. 
    Values: 
    Case Matching 
       - <Header type = ip v4 > 
       - <bit specification, header part> 
       - <Selection interval specification, header part> 
       - <Header type = ipv6> 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 20] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
       - <bit specification, header part> 
       - <Selection interval specification, header part> 
       - <payload byte number N> 
       - <bit specification, payload part> 
       - <Selection interval specification, payload part> 
     
    Notes to Case Matching: 
     
       - The filter can be defined for the header part only, for the 
          payload part only or for both. In the latter case the 
          matching must be an AND of the two. 
       - The bit specification, for the header part, can be 
          specified for ipv4 or ipv6 only, or both 
       - In case of ipv4, the bit specification is a sequence of 20 
          Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask 
          to be applied to the header 
       - In case of ipv6, it is a sequence of 40 Hexadecimal numbers 
          [00,FF] specifying a 40 bytes bitmask to be applied to the 
          header 
       - The bit specification, for the payload part, is a sequence 
          of Hexadecimal numbers [00,FF] specifying the bitmask to be 
          applied to the first N bytes of the payload, as specified 
          by the previous field. In case the Hexadecimal number 
          sequence is longer then N, only the first N numbers are 
          considered. 
       - In case the payload is shorter than N, the packet will not 
          match the filter Other options, like padding with zeros, 
          may be considered in the future. 
       - The selection interval specification is a list of non 
          overlapping intervals [intv_begin, intv_end] where 
          intv_begin, intv_end are bit strings of length 20*8 (ipv4 
          case), 40*8 (ipv6 case), N*8 (payload case). 
       - A filter cannot be defined on the options field of the ipv4 
          header, neither on stacked headers of ipv6. 
       - This specification doesn’t preclude the future definition 
          of a high level syntax for defining in a concise way bit 
          selection and matching rules in a more human readable form 
          (e.g. ‘‘TCP port in [2000,3000]’’). The requirement is that 
          such a syntax can be univoquely compiled into the one 
          defined above 
     
    Case Hashing: 
       - <Header type = ipv4> 
       - <Input bit specification, header part> 
       - <Header type =  ipv6> 
       - <Input bit specification, header part> 
       - <payload byte number N> 
       - <Input bit specification, payload part> 
       - <additional hash input bits (seed)> 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 21] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
       - Hashing function specification 
            - Hash function name  
            - Length of input key (eliminate 0x bytes) 
            - Output value (length M and bitmask) 
            - Selection interval specification, as a list of non 
              overlapping intervals [start value, end value] where 
              value is in [0,2^M-1] 
            - Additional parameters dependent on specific hash 
              function 
  
    Notes to Case Hashing: 
     
       - On Input bit specifications fields, the same notes on bit 
          specifications of the Matching case reported above apply 
       - Some hash functions, their detailed functioning and their 
          specific parameter list are described in [NicMoDu]. 
  
    Case Router State: 
  
       - Ingress interface at which packet arrives equals a 
          specified value 
       - Egress interface to which packet is routed to equals a 
          specified value 
       - Packet violated Access Control List (ACL) on the router 
       - Failed Reverse Path Forwarding (RPF) 
       - Failed Resource Reservation (RSVP) 
       - No route found for the packet 
       - Origin AS equals a specified value or lies within a given  
          range 
       - Destination AS equals a specified value or lies within a 
          given range 
  
    Note to Case Router State: 
       - All Router state entries can be linked by AND, OR, NOT 
          operators 
        
    OPERATING_TIME 
    Description: The OPERATING_TIME parameter describes the 
    start/stop time of filtering process. List elements must not 
    overlap. The start time of the first element can be omitted, 
    meaning ‘‘from now’’. The end time of the last element can be 
    omitted, meaning ‘‘until sampler is removed’’. 
    Values: List of (Start time, End time) 
     
    ASSOCIATIONS 
    Description: The ASSOCIATIONS field describes the observation 
    point and (possibly) the IPFIX processes to which the packet 
    selector can be  associated. The STREAM ID denotes the origin of 
    the data stream that is input to the selection function. It can 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 22] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    be the observation point directly or the ID of another selector. 
    With this it is possible to define combined schemes. If the 
    STREAM ID contains IDs from other selectors, one can derive the 
    original observation point from the selector definitions of 
    these specified selectors. 
     
    Values: STREAM ID, Metering process ID, Exporting process ID> 
    With STREAM ID: Observation point ID AND List of SELECTOR_IDs 
     
 6. Composite Techniques  
     
    Composite schemes are realized by using the STREAM ID in the 
    information models. The STREAM ID denotes from which selectors 
    the input stream originates. If multiple stream IDs are given, 
    this means that the selector operates on the packet stream 
    simply resulting from the time superposition of the output of 
    all the listed filters and samplers. Note that a sampler/filter 
    could be intermittently active, as defined in the OPERATING TIME 
    field. 
    Some examples of composite schemes are reported below. 
     
 6.1 Cascaded filtering->sampling or sampling->filtering 
  
    If a filter precedes a sampling process the role of filtering is 
    to create a set of ‘‘parent populations’’ from a single stream 
    that can then be fed independently to different sampling 
    functions, with different parameters tuned for the population 
    itself (e.g. if streams of different intensity result from 
    filtering, it may be good to have different sampling rates). If 
    filtering follows a sampling process, the same sampling rate and 
    type is applied to the whole stream, independently of the 
    relative size of the streams resulting from the filtering 
    function. Moreover, also packets not destined to be selected 
    will ‘‘load’’ the sampling function. So, in principle, filtering 
    before sampling allows a more accurate tuning of the sampling 
    procedure, but if filters are too complex to work at full line 
    rate (e.g. because they have to access router state 
    information), sampling before filtering may be a need. 
     
 6.2 Stratified Sampling  
     
    Stratified sampling is one example for using a composite 
    technique. The basic idea behind stratified sampling is to 
    increase the estimation accuracy by using a-priori information. 
    The a-priori information is used to perform an intelligent 
    grouping of the elements of the parent population. With this a 
    higher estimation accuracy can be achieved with the same sample 
    size. 
     

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 23] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    Stratified sampling divides the sampling process into multiple 
    steps. First, the elements of the parent population are grouped 
    into subsets in accordance to a given characteristic. This 
    grouping can be done in multiple steps. Then samples are taken 
    from each subset.  
     
    The stronger the correlation between the characteristic used to 
    divide the parent population and the characteristic of interest 
    (for which an estimate is sought after), the easier is the 
    consecutive sampling process and the higher is the 
    stratification gain. For instance if the dividing characteristic 
    were equal to the investigated characteristic, each element of 
    the sub-group would be a perfect representative of that 
    characteristic. In this case it would be sufficient to take one 
    arbitrary element out of each subgroup to get the actual 
    distribution of the characteristic in the parent population. 
    Therefore stratified sampling can reduce the costs for the 
    sampling process (i.e. the number of samples needed to achieve a 
    given level of confidence). 
  
    For stratified sampling one has to specify classification rules 
    for grouping the elements into subgroups and the sampling scheme 
    that is used within the subgroups. The classification rules can 
    be expressed by multiple filters. For the sampling scheme within 
    the subgroups the parameters have to be specified as described 
    above. 
     
 7. Security Considerations 
  
    Security threats can occur if the configuration of sampling 
    parameters or the communication of sampling parameters to the 
    application is corrupted. This document only describes sampling 
    schemes that can be used for packet selection. It neither 
    describes a mechanism how those parameters are configured nor 
    how these parameters are communicated to the application. 
    Therefore the security threats that originate from this kind of 
    communication cannot be assessed with the information given in 
    this document.  
     
    In some cases malicious users or attackers may be interested to 
    hide packets from the service provider. For instance if packet 
    selectors are used for accounting or intrusion detection 
    applications, users may want to prevent that packets are 
    selected. If a deterministic sampling scheme is used or a 
    selection scheme that takes packet content into account, the 
    user can shape or send packets in a way that they are less 
    likely to be selected. This has to be taken into account when 
    choosing an appropriate packet selection technique. 
  

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 24] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
 8. References 
     
    [AmCa89]    Paul D. Amer, Lillian N. Cassel: Management of 
                 Sampled Real-Time Network Measurements, 14th 
                 Conference on Local Computer Networks, October 
                 1989, Minneapolis, pages 62-68, IEEE, 1989 
     
    [ClPB93]    K.C. Claffy, George C Polyzos, Hans-Werner Braun: 
                 Application of Sampling Methodologies to Network 
                 Traffic Characterization, Proceedings of ACM 
                 SIGCOMM'93, San Francisco, CA, USA, September 13 - 
                 17, 1993 
     
    [CoGi98]    I. Cozzani, S. Giordano: Traffic Sampling Methods 
                 for end-to-end QoS Evaluation in Large 
                 Heterogeneous Networks. Computer Networks and ISDN 
                 Systems, 30 (16-18), September 1998. 
     
    [DuGG03]    Nick Duffield, Albert Greenberg, Matthias 
                 Grossglauser, Jennifer Rexford: A Framework for 
                 Passive Packet Measurement, Internet Draft draft-
                 ietf-psamp-framework-01, work in progress, June 
                 2003                      
     
    [DuGr00]    Nick Duffield, Matthias Grossglauser: Trajectory 
                 Sampling for Direct Traffic Observation, 
                 Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, 
                 August 28 - September 1, 2000. 
     
    [EsVa01]    C. Estan and G. Varghese, ‘‘New Directions in 
                 Traffic Measurement and Accounting’’, ACM SIGCOMM 
                 Internet Measurement Workshop 2001, San Francisco 
                 (CA) Nov. 2001 
     
    [HT52]      D.G. Horvitz and D.J. Thompson, A Generalization of 
                 Sampling   without replacement from a Finite 
                 Universe. J. Amer. Statist. Assoc. Vol. 47, pp. 
                 663-685, 1952. 
     
    [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic 
                 Estimation for the Largest Sources on a Network, 
                 Using Packet Sampling with Limited Storage, HP 
                 technical report, Managemenr, Mathematics and 
                 Security Department, HP Laboratories, Bristol, 
                 March 1992, 
                 http://www.hpl.hp.com/techreports/92/HPL-92-35.html 
     

 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 25] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    [Knuth98]   Donald E. Knuth: The Art of Computer Programming, 
                 Volume 3: Searching and Sorting, Addison Wesley, 
                 1998  
                   
    [MolFl03]   M.Molina: Flow selection support in IPFIX, Internet 
                 Draft <draft-molina-flow-selection-00.txt>, work in 
                 progress, October 2003. 
     
    [Molina03]  M.Molina: A scalable and efficient methodology for 
                 flow monitoring in the internet, International 
                 Teletraffic Congress (ITC-18), Berlin, Sep. 2003 
     
    [NicMoDu]   S. Niccolini, M.Molina, N.Duffield: Hash functions 
                 description for packet selection, Internet Draft 
                 <draft-niccolini-hash-descr-00.txt>, work in 
                 progress, October 2003. 
     
    [QuZC03]    J. Quittek, T. Zseby, B. Claise, S. Zander: 
                 Requirements for IP Flow Information Export, 
                 Internet Draft <draft-ietf-ipfix-reqs-10.txt>, work 
                 in progress, June 2002 
     
    [Zseb02]    Tanja Zseby: Deployment of Sampling Methods for SLA 
                 Validation with Non-Intrusive Measurements, 
                 Proceedings of Passive and Active Measurement 
                 Workshop (PAM 2002), Fort Collins, CO, USA, March 
                 25-26, 2002  
     
 9. Author's Addresses 
     
    Tanja Zseby 
    Fraunhofer Institute for Open Communication Systems 
    Kaiserin-Augusta-Allee 31 
    10589 Berlin 
    Germany 
    Phone: +49-30-34 63 7153 
    Fax:   +49-30-34 53 8153 
    Email: zseby@fokus.fhg.de 
     
    Maurizio Molina 
    NEC Europe Ltd., Network Laboratories 
    Adenauerplatz 6 
    69115 Heidelberg 
    Germany 
    Phone: +49 6221 90511-18  
    Email: molina@ccrle.nec.de 
  
    Fredric Raspall 
    NEC Europe Ltd., Network Laboratories 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 26] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    Adenauerplatz 6 
    69115 Heidelberg 
    Germany 
    Phone: +49 6221 90511-31  
    EMail: raspall@ccrle.nec.de 
     
    Nick Duffield 
    AT&T Labs - Research 
    Room B-139 
    180 Park Ave 
    Florham Park NJ 07932, USA 
    Phone: +1 973-360-8726 
    Email: duffield@research.att.com 
     
 10.     Intellectual Property Statement 
  
    AT&T Corporation may own intellectual property applicable to 
    this 
    contribution. The IETF has been notified of AT&T's licensing 
    intent for the specification contained in this document. See 
    http://www.ietf.org/ietf/IPR/ATT-GENERAL.txt for AT&T's IPR 
    statement. 
     
 11.     Full Copyright Statement 
  
    Copyright (C) The Internet Society (2002). All Rights Reserved. 
    This document and translations of it may be copied and furnished 
    to others, and derivative works that comment on or otherwise 
    explain it or assist in its implementation may be prepared, 
    copied, published and distributed, in whole or in part, without 
    restriction of any kind, provided that the above copyright 
    notice and this paragraph are included on all such copies and 
    derivative works. However, this document itself may not be 
    modified in any way, such as by removing the copyright notice or 
    references to the Internet Society or other Internet 
    organizations, except as needed for the purpose of developing 
    Internet standards in which case the procedures for copyrights 
    defined in the Internet Standards process must be followed, or 
    as required to translate it into languages other than 
    English.  
     
    The limited permissions granted above are perpetual and will not 
    be revoked by the Internet Society or its successors or assigns. 
     
    This document and the information contained herein is provided 
    on an 
    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET 
    ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR 
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE 


 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 27] 
  
 Internet Draft  Techniques for IP Packet Selection    October 2003 
  
  
    OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 
    IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A 
    PARTICULAR PURPOSE. 
     
  
 Zseby, Molina, Raspall, Duffield   Expires April 2004     [Page 28]