Internet Draft                                                         
Document: <draft-ietf-psamp-sample-tech-02.txt>               T. Zseby 
Expires: December 2003                                Fraunhofer FOKUS 
                                                             M. Molina 
                                                       NEC Europe Ltd. 
                                                            F. Raspall 
                                                       NEC Europe Ltd. 
                                                         Nick Duffield 
                                                  AT&T Labs - Research 
                                                                      
                                                             June 2003 
 
 
  Sampling and Filtering Techniques for IP Packet Selection 
 
 
Status of this Memo 
 
  This document is an Internet-Draft and is in full conformance with 
  all provisions of Section 10 of RFC2026.  
   
  Internet-Drafts are working documents of the Internet Engineering 
  Task Force (IETF), its areas, and its working groups. Note that other 
  groups may also distribute working documents as Internet-Drafts. 
  Internet-Drafts are draft documents valid for a maximum of six months 
  and may be updated, replaced, or obsoleted by other documents at any 
  time. It is inappropriate to use Internet-Drafts as reference 
  material or to cite them other than as "work in progress."  
   
  The list of current Internet-Drafts can be accessed at 
  http://www.ietf.org/ietf/1id-abstracts.txt  
   
  The list of Internet-Draft Shadow Directories can be accessed at 
  http://www.ietf.org/shadow.html. 
   
Abstract 
   
  This document describes sampling and filtering techniques for IP 
  packet selection. It introduces information models for packet 
  sampling, for packet filtering and for combinations of methods. The 
  information models describe what information has to be specified in 
  order to describe the method. This information is used for 
  configuring the selection technique in measurement processes and for 
  reporting the technique in use to the measurement data collection 
  process.   
  The document first suggests some terminology, then it describes in 
  detail packet sampling and packet filtering techniques and their 
  parameters. It also describes how these two techniques can be 
  combined to build more elaborate packet selectors. Finally, it 
  introduces information models for the most common sampling and 
  filtering techniques. 
   
   
Table of Contents 
 
  1.   Introduction.................................................2 
  2.   Terminology..................................................2 
  3.   Scope and Deployment of Packet Selection Techniques..........4 
  3.1  Sampling.....................................................5 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 1] 


Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  3.1.1 Systematic Sampling.........................................5 
  3.1.2 Random Sampling.............................................6 
  3.1.2.1 n-out-of-N sampling.......................................6 
  3.1.2.2 Probabilistic sampling....................................6 
  3.1.2.2.1 Uniform Probabilistic Sampling..........................7 
  3.1.2.2.2 Non-Uniform Probabilistic Sampling......................7 
  3.1.2.2.3 Non-Uniform flow State dependent sampling...............7 
  4.   Filtering....................................................7 
  4.1  Mask/Match filtering.........................................8 
  4.2  Hashing filtering............................................8 
  4.2.1 Random sampling emulation...................................9 
  4.2.2 Consistent packet selection and its applications............9 
  4.2.3 Guarding Against Pitfalls and Vulnerabilities..............10 
  4.3  Router State filtering......................................10 
  5.   Input Parameters and Information Models.....................11 
  5.1  Information Model for Sampling Techniques...................11 
  5.2  Information Model for Filtering Techniques..................13 
  6.   Composite Techniques........................................15 
  6.1  Cascaded filtering->sampling or sampling->filtering.........15 
  6.2  Stratified Sampling.........................................15 
  7.   Security Considerations.....................................16 
  8.   References..................................................16 
  9.   Author's Addresses..........................................17 
  10.  Intellectual Property Statement.............................18 
  11.  Full Copyright Statement....................................18 
 
1. Introduction 
   
  Increasing data rates and growing measurement demands increase the 
  requirements for data collection resources. For measurement scenarios 
  in backbone networks it is often required to measure whole traffic 
  aggregates instead of single flows. Furthermore, some measurement 
  methods require the capturing of packet headers or even parts of the 
  payload. All this can lead to an overwhelming amount of measurement 
  data, resulting in high demands regarding resources for metering, 
  storage, transport and post processing.  
 
  In some cases specialized hardware helps to fulfill these demands but 
  on the other hand increases the costs for providing the measurement. 
  Since measurements are mainly a supporting functionality for the 
  service provisioning, measurement costs usually should be limited to 
  a small fraction of the costs of the network service provisioning 
  itself. Therefore a reduction of the measurement result data is 
  crucial to prevent the depletion of the available (i.e. the 
  affordable) resources.  Such a reduction can be achieved by a 
  reasonable deployment of packet selection techniques, that sample a 
  subset of the packets while still allowing an appropriate accuracy, 
  or filter out all packets that are not of interest for the 
  measurement at all. Packet selection helps to prevent an exhaustion 
  of resources and to limit the measurement costs. Examples for 
  applications that benefit from packet selection are given in 
  [DuGG03]. 
 
2. Terminology 
   
  The terms used throughout this document are defined in [DuGG03]. We 
  here just repeat the definitions of the specific selection 
  operations. 
 

Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 2] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  Filtering: a filter is a selection operation that selects a packet 
  deterministically based on the packet content, its treatment, and 
  functions of these occurring in the selection state. Examples include 
  match/mask filtering, and hash-based selection. 
   
  Sampling: a selection operation that is not a filter is called a 
  sampling operation. 
   
  Content-independent Sampling: a sampling operation that does not use 
  packet content (or quantities derived from it) as the basis for 
  selection is called a content-independent sampling operation. 
  Examples include systematic sampling, and uniform pseudorandom 
  sampling driven by a pseudorandom number whose generation is 
  independent of packet content. Note that independent sampling a does 
  not need to access the packet content in order to make the selection 
  decision. 
   
  Content-dependent Sampling: a sampling operation where selection is 
  dependent on packet content is called a content-dependent sampling 
  operation. Examples include pseudorandom selection according to a 
  probability that depends on the contents of a packet field; note that 
  this is not a filter. 
   
  Emulated Sampling: selection operations in any of the above four 
  categories may be emulated by operations in the same or another 
  category for the purposes of implementation. 
   
  Hash-based selection: a filter specified by a hash domain, a hash 
  function, and hash range and a hash selection range. 
   
  Hash domain: a subset of the packet content and the packet treatment, 
  viewed as a N-bit string for some positive integer N. 
   
  Hash range: a set of M-bit strings for some positive integer M 
   
  Hash function: a deterministic map from the hash domain into the  
  hash range. 
   
  Hash selection range: a subset of the hash range. The packet is 
  selected if the action of the hash function on the hash domain for 
  the packet yields a result in the hash selection range. 
       
  Metering process: see the definition in [QuZC03]  
    
  Pool size: the size of a set of packets in a packet stream.  
   
  Sample size: the size of a set of packets selected by a sampling 
  operation. 
            
  Target Sampling Rate: a configurable sampling rate in a sampling 
  operation. 
            
  Attained Sampling Rate: Given a subset of packets in a stream input 
  to a sampling operation, the attained sampling rate is the ratio of 
  the sample size to the pool size. 
   
  Packet Stream: a sequence of packets, each of which was observed at 
  the observation point. Note that when packets are sampled from a 
  stream, the selected packets usually do not have common properties by 
  which they can be distinguished from packets that have not been 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 3] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  selected.  Therefore we define here the term stream instead of flow, 
  which is defined as set of packets with common properties [QuZC02]. 
   
  Selection Process: a selection process selects a substream of packets 
  from the observed packet stream. A selection process entails the 
  composition of one or more selectors operating in succession. If 
  selectors are composed, the output packet stream of one selector 
  forms the input packet stream for the succeeding selector. 
   
 
3. Scope and Deployment of Packet Selection Techniques 
 
  The function of packet selection is to select a subset from the 
  stream of all packets visible at an observation point. Selection can 
  be used to select packets based on their content, and/or to reduce 
  the rate of packet reports regardless of content.  
   
  The selection technique used to select a subset of packets out of all 
  those crossing an observation point depends on the purpose 
  (application) for which measurement is performed. If the main purpose 
  of an application is to infer some characteristic of the whole set of 
  crossing packets without processing them all (thus reducing the 
  computation load) then we call the used selection technique 
  ôsamplingö. In principle, with sampling the content of the packet is 
  not relevant for the packet selection: what matters is only that the 
  selected sample has a distribution of the characteristic to infer 
  similar to the one of the parent population, so that it can be 
  estimated reliably. The sampling decision may be based on the 
  temporal or spatial position of the packet in the packet stream, or 
  may depend on a (pseudo) random number extraction or calculation. 
   
  On the contrary, if the application needs to consider all the packets 
  having some common property, then we call the selection technique 
  ôfilteringö. The property can be directly derived by some computation 
  on the packet content, or depend on the treatment given by the router 
  to the packet. We conclude that sampling can depend on packet 
  position or on (pseudo) random decisions, while filtering depends on 
  packet content, but never depends on packet position or on (pseudo) 
  random decisions. 
   
  Note that a common technique to select packets is to compute a hash 
  function on some bits of the packet header and/or content and to 
  select it if the result falls in a certain selection range. Since 
  hashing is a deterministic operation, it is a powerful mean to ensure 
  that the same packets are selected at multiple measurement points. 
  Depending on the chosen input bits, on the hash function and on the 
  selection range, this technique could also be used to emulate the 
  random selection of packets with a given probability p. Hashing is 
  then a particular type of filtering, but can also be used to emulate 
  random sampling. 
   
  The introduced classification is mainly useful for the definition of 
  an information model describing ôprimitiveö selection techniques. 
  More complex selection techniques may then be described through the 
  composition of cascaded sampling and filtering operations. 
  For example, a packet selection that weights the selection 
  probability on the basis of the packet length can be described as a 
  set of filter/sampling cascades. However, this descriptive approach 
  is not intended to be rigid: if a common and consolidated selection 

 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 4] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  practice turns out to be too complex to be described as a composition 
  of the mentioned building blocks, an ad hoc description can be 
  specified instead. 
   
  We consider packet selectors as part of an IPFIX metering process 
  which also can use the IPFIX exporting process. This is expressed as 
  association to one or more IPFIX processes. 
 
3.1 Sampling 
 
  The deployment of sampling techniques aims at the provisioning of 
  information about a specific characteristic of the parent population 
  at a lower cost than a full census would demand. In order to plan a 
  suitable sampling strategy it is therefore crucial to determine the 
  needed type of information and the desired degree of accuracy in 
  advance. 
   
  First of all it is important to know the type of metric that should 
  be estimated. The metric of interest can range from simple packet 
  counts [JePP92] up to the estimation of whole distributions of flow 
  characteristics (e.g. packet sizes)[ClPB93].  
 
  Secondly, the required accuracy of the information and with this, the 
  confidence that is aimed at, should be known in advance. For instance 
  for usage-based accounting the required confidence for the estimation 
  of packet counters can depend on the monetary value that corresponds 
  to the transfer of one packet. That means that a higher confidence 
  could be required for expensive packet flows (e.g. premium IP 
  service) than for cheaper flows (e.g. best effort). The accuracy 
  requirements for validating a previously agreed quality can also vary 
  extremely with the customer demands. These requirements are usually 
  determined by the service level agreement (SLA). 
 
  Sampling is considered as part of the metering process. A metering 
  process consists of multiple functions (capturing, time stamping, 
  etc.). Sampling can be applied at different functions of the metering 
  process. In the following we consider a measured IP packet with its 
  observation point and timestamp as basis elements of the parent 
  population. 
   
  The sampling method and the parameters in use must be clearly 
  communicated to all applications that use the measurement data. Only 
  with this knowledge a correct interpretation of the measurement 
  results can be ensured.  
   
  Sampling Methods can be characterized by the sampling algorithm, the 
  trigger type used for starting a sampling interval and the length of 
  the sampling interval. These parameters are described here in detail. 
  The sampling algorithm describes the basic process for selection of 
  samples. In accordance to [AmCa89] and [ClPB93] we define the 
  following basic sampling processes: 
   
3.1.1  Systematic Sampling 
 
  Systematic sampling describes the process of selecting the starting 
  points and the duration of the selection intervals according to a 
  deterministic function. This can be for instance the periodic 
  selection of every n-th element of a trace but also the selection of 
  all packets that arrive at pre-defined points in time. Even if the 
  selection process does not follow a periodic function (e.g. if the 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 5] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  time between the sampling intervals varies over time) we consider 
  this as systematic sampling as long as the selection is 
  deterministic.  
  The use of systematic sampling always involves the risk of biasing 
  the results. If the systematics in the sampling process resembles 
  systematics in the observed stochastic process (occurrence of the 
  characteristic of interest in the network), there is a high 
  probability that the estimation will be biased. Systematics (e.g. 
  periodic repetition of an event) in the observed process might not be 
  known in advance.  
  Here only equally spaced schemes are considered, where triggers for 
  sampling are periodic, either in time or in packet count. All packets 
  occurring in a selection interval (either in time or packet count) 
  beyond the trigger are selected. The case that the selection interval 
  covers only the first available packet for count-based sampling is 
  often called 1 in N sampling: packets are selected with count period 
  N. More generally, some number M<N of consecutive packets are 
  selected. 
 
  Systematic count-based 
  In systematic count-based sampling the start and stop triggers for 
  the sampling interval are defined in accordance to the special packet 
  position (packet count). 
 
  Systematic time-based 
  In systematic count-based sampling the start and stop triggers for 
  the sampling interval are defined in accordance to the temporal 
  packet position (arrival time). 
 
  Both schemes are contentûindependent selection schemes. 
   
3.1.2  Random Sampling 
 
  Random sampling selects the starting points of the sampling intervals 
  in accordance to a random process. The selection of elements are 
  independent experiments. With this, unbiased estimations can be 
  achieved. In contrast to systematic sampling, random sampling 
  requires the generation of random numbers. One can differentiate two 
  methods of random sampling: 
   
3.1.2.1 n-out-of-N sampling 
 
  In n-out-of-N sampling n elements are selected out of the parent 
  population that consists of N elements. One example would be to 
  generate random numbers and select all packets which have a packet 
  position equal to one of the random numbers. For this kind of 
  sampling the sample size is fixed.  
   
3.1.2.2 Probabilistic sampling 
   
  In probabilistic sampling the decision whether an element is selected 
  or not is made in accordance to a pre-defined selection probability. 
  An example would be to flip a coin for each packet and select all 
  packets for which the coin showed the head. For this kind of sampling 
  the sample size can vary for different trials. The selection 
  probability is not necessarily the same for each packet. We 
  distinguish between uniform probabilistic sampling and non-uniform  
  probabilistic sampling. 
   

Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 6] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
3.1.2.2.1      Uniform Probabilistic Sampling 
   
  For Uniform Random Sampling packets are selected independently with 
  some uniform probability 1/N. This sampling can be count-driven, and 
  is sometimes referred to as geometric random sampling, since the 
  difference in count between successive selected packets are 
  independent random variables with a geometric distribution of mean N. 
  A time-driven analog, exponential random  sampling, has the time 
  between triggers exponentially distributed. 
  Both geometric and exponential random sampling are examples of what 
  is known as additive random sampling, defined as sampling where the 
  intervals or counts between successive samples are independent 
  identically distributed random variable. 
   

3.1.2.2.2      Non-Uniform Probabilistic Sampling 
   
  Also known as non-uniform probability sampling, this is a variant of 
  independent random sampling in which the sampling probabilities can 
  depend on the selection process input. This can be used to weight 
  sampling probabilities in order e.g. to boost the chance of sampling 
  packets that are rare but are deemed important. Unbiased estimators 
  for quantitative statistics are recovered by renormalization of 
  sample values; see [HT52]. 
   

3.1.2.2.3      Non-Uniform flow State dependent sampling  
   
  Another type of sampling that can be classified as Non-Uniform (and, 
  possibly, probabilistic) is closely related to the flow concept as 
  defined in [QuZC02], and it is only used jointly with a flow 
  monitoring function (IPFIX monitoring function). Packets are 
  selected, probabilistically or deterministically, dependent on a 
  selection state. The point, here, is that the selection state is 
  determined also by the state of the flow the packet belongs to and/or 
  by the state of the other flows currently being monitored by the 
  associated flow monitoring function. This type of sampling is also 
  content dependent because the identification of the flow the packet 
  belongs to requires analyzing part of the packet content. If the 
  packet is selected, then it is passed as an input to the IPFIX 
  monitoring function (this is called ôLocal Exportö in [DuGG03] 
  Selecting the packet depending on the state of a flow cache is useful 
  when memory resources of the flow monitoring function are scarce 
  (i.e. thereÆs no room to keep all the flows that have been scheduled 
  for monitoring). The algorithms leading to the selection decision are 
  not subject to standardization, but they may take into account 
  factors like flow volume, flow grow rate, flow monitoring priority. 
  An example of such an algorithm is described in [EsVa01] 
   
 
  n-out-of-N sampling and uniform probabilistic sampling are contentû
  independent selection schemes. For non-uniform probabilistic sampling 
  the sampling probability can be based on packet content.  
   
4. Filtering  
   
  Filtering is the selection of packets based only the packet content, 
  the treatment of the packet at the observation point, and 

 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 7] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  deterministic functions of these occurring in the selection state. 
  The packet is selected if these quantities fall into a specified 
  range. 
  Packet filtering can be done for a wide variety of purposes e.g. for 
  security, SLA enforcing, accounting. Depending on the type of 
  filtering, it can be applied in different parts of the metering 
  process. The role of filtering, as the word itself suggest, is to 
  separate all the packets having a certain property from those not 
  having it. A distinguishing characteristic from sampling is that the 
  property never depends on the packet position in time or in the 
  space, or on a random process. 
  We identify and describe in the following three filtering techniques. 
  The first two (Mask/Match filtering and Hashing filtering) are 
  stateless, and therefore can make their decision based on the 
  analysis of portion of the packet only. The other (router state 
  filtering) requires to access state information after the analysis of 
  part of the packet and is therefore more complex: its usage makes 
  sense only in particular circumstances, as described below. 
   
4.1 Mask/Match filtering 
 
  This type of filtering selects a packet operating as follows: first a 
  combination of packetÆs bit positions is selected taking the logical 
  AND of portion of the packetÆs bits and a mask, then itÆs checked if 
  the result, seen as an hexadecimal number, falls within a selection 
  range. In the simplest case the selection range can be a single value 
  (e.g. filter selects only the packets with a certain specific header 
  field, e.g. a specific source IP address) but more in general the 
  selection range can be a sequence of non-overlapping intervals if 
  high level interfaces are used to specify mask and matches. The 
  selected bits of the packet arenÆt necessarily only those of the IP 
  header, but in case both bits of the IP header and the payload are 
  selected, the masks and the selection intervals MUST be specified 
  separately for the header and the payload. 
  An implementation is not constrained to apply exactly all the steps 
  or in this sequence, provided that the result is equivalent to a 
  logical function doing it. 
  Examples of filters of this class are filters that select packets 
  based on the matching of some of the IP header fields with a 
  (possibly masked) specific value, filters that select packets having 
  some IP header fields values falling within a range, filters that do 
  the same as above on some of the transport header fields (that are 
  thus considered as part of the payload), or a combination of all of 
  the above mentioned possibilities. 
   
 
4.2 Hashing filtering 
   
  A hash function h maps the packet content c, or some portion of it, 
  onto a range R. The packet is selected if h(c) is an element of S, 
  which is a subset of R called the ôselection rangeö. Thus hash-based 
  sampling is indeed a particular case of filtering: the object is 
  selected if c is in inv(h(S)). But for desirable hash functions the 
  inverse image inv(h(S)) will be extremely complex, and hence h would 
  not be expressible as, say, a match/mask filter or a simple 
  combination of these. 
  Hash-based sampling has mainly two types of usage: it offers a way to 
  emulate random sampling by using packet content to generate 
  pseudorandom variates and a way to consistently select subsets of 

 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 8] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  packets that share a common property. In the following subsections we 
  give more details about them. 
  However, both usages require that the hash functions has two 
  statistical properties. First, the hash function h must have good 
  mixing properties, in the sense that small changes in the input (e.g. 
  the flipping of a single bit) cause large changes in the output (many 
  bits change). Then any local clump of values of c is spread widely 
  over R by h, and so the distribution of h(c) is fairly uniform even 
  if the distribution of c is not. Then the sampling rate is #S/#R, 
  which can be tuned by choice of S. If S and R are sets contiguous 
  integers, h(c), suitably shifted and normalized, can be interpreted 
  as a pseudorandom variate. The second desirable property depends more 
  closely on the statistics of the content c. In applications, the 
  content c comprises a number of distinct fields, c1 ... cm, e.g. 
  source and destination IP Address, IP identification, and TCP/UDP 
  port numbers (if present) for a packet. With a hash function 
  satisfying the first properties above, selection decisions will 
  appear uncorrelated with the contents of any individual field, if the 
  complementary fields are (i) sufficiently variable themselves, and 
  (ii) sufficiently uncorrelated with cj. 
   
4.2.1  Random sampling emulation 
   
  Although pseudorandom number generators with well understood 
  properties have been developed, they may not be the method of choice 
  in setting where computational resources are scarce. A convenient 
  alternative is to use hash functions of packet content as a source of 
  randomness. The hash (suitably renormalized) is a pseudorandom 
  variate in the interval [0,1]. Other schemes may use packet fields in 
  iterators for pseudorandom numbers. 
  The point here, is that the statistical properties of the idealized 
  packet selection law (such as independence of sampling decisions for 
  different packets, or independence on packet content) may not be 
  exactly shared by an implementation, but only approximately so. 
  Although the selection decisions for non-uniform independent random 
  sampling (see Section 3.1.2.2.2 above) also depend on the packet 
  content, this form of sampling is distinguished from the use of 
  packet content to generate variates. In the former case, the content 
  only determines the selection probabilities: selection could then 
  proceed e.g by use of a variates obtained by an independent 
  pseudorandom number generator. In the latter case, the content 
  determines the pseudorandom variates rather than the probabilities. 
   
   
4.2.2  Consistent packet selection and its applications 
   
  Consistent packet selection is also often referred as trajectory 
  sampling. 
  In trajectory sampling, all routers in a network hash portion of the 
  packet content using identical hash function and selection range. The 
  domain of the hash is restricted to those parts of the packet that 
  are invariant from hop to hop. Fields such as Time-to-Live, which is 
  decremented per hop, and header CRC, which is recalculated per hop, 
  are thus excluded from the hash domain. Thus, a given packet is 
  selected at all either all points on its path through the network, or 
  at none. The domain of the hash function needs to be wider than just 
  a flow key, if packets are to be selected quasi-randomly within flows 
  (and e.g. include portions of the payload), see [DuGr00]. If a report 
  on each selected packet is exported to a collector, the latter can 
  reconstruct trajectories of the selected packets, provided it can 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 9] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  match different reports on the same packet, and distinguish these 
  from reports on different packets. For this purpose, reports may also 
  contain a second distinct hash (identification hash) of the selected 
  packets. The identification hash can be considered as part of the 
  selection state. 
  Applications of trajectory sampling include (i) estimation of the 
  network path matrix, i.e., the traffic intensities according to 
  network path, broken down by flow; (ii) detection of routing loops, 
  as indicated by self-intersecting trajectories; (iii) passive 
  performance measurement: prematurely terminating trajectories 
  indicate packet loss, and packet latencies can be determined if 
  reports include (synchronized) timestamps; (iv) network attack 
  tracing, of the actual paths taken by attack packets with spoofed 
  source addresses. 
   
   
4.2.3  Guarding Against Pitfalls and Vulnerabilities 
   
  A concern is whether some large set of related packets could be 
  sampled at a rate that significantly differs from the expected 
  sampling rate, either (i) through unanticipated behavior in the hash 
  function, or (ii) because the packets had been deliberately crafted 
  to have this property. 
  The first point underlines the importance of using a hash function 
  with good mixing properties. Examples of such are CRC32 and hash 
  functions based on modular arithmetic, see 6.4 in [Knuth98]. The 
  statistical properties of candidate hash functions need to be 
  evaluated, preferably on packet before adoption for hash-based 
  sampling 
  Hash sampling could be overloaded (or evaded) by an attacker if the 
  hash function and the selection rate are both known. A service 
  provider could build a first defense keeping S private. Then, an 
  attacker could not determine whether a crafted packet is select, but 
  he would still know that a crafted a set of packets all with the same 
  hash is either all selected or all not selected. Moreover, when 
  applications (like multi domain trajectory sampling, or One way delay 
  estimation across multiple domains) may require multiple 
  administrative entities to agree on a common hash function and 
  selection range, mutual trust between the entities cannot be avoided 
  and just keeping S secret may not be feasible. A stronger defense is 
  to employ a parametrizable hash function and keep the parameter 
  private: in this case, the set of hash values of the packets could 
  not be determined. Examples of parameters are the initial vector in 
  CRC32, and moduli in hashes based on modular arithmetic. 
   
 
4.3 Router State filtering 
   
  This class of filters select a packet on the basis of the following 
  conditions), possibly combined with the AND, OR or NOT operators. 
 
     - Ingress interface at which packet arrives equals a specified 
        value 
     - Egress interface to which packet is routed to equals a 
        specified value 
     - Packet violated acl on the router 
     - Failed rpf  
     - Failed rsvp 
     - No route found for the packet 

 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 10] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
     - Origin AS equals a specified value or lies within a given  
        range 
     - Destination AS equals a specified value or lies within a given 
        range 
      
  Router architectural considerations may preclude some information 
  concerning the packet treatment, e.g routing state, being available 
  at line rate for selection of packets. However, if selection not 
  based on routing state has reduced down from line rate, subselection 
  based on routing state may be feasible. 
 
5. Input Parameters and Information Models 
 
  The decision whether to select a packet or not is based on a function 
  which is performed when the packet arrives at the selection process. 
  Packet selection schemes differ in the input parameters for the 
  selection process and the functions they require to do the packet 
  selection. The following table gives an overview. 
 
       Scheme       |   input parameters     |     functions  
     ---------------+------------------------+--------------------- 
      systematic    |    packet position     |  packet counter  
      count-based   |    sampling pattern    |  
     ---------------+------------------------+--------------------- 
      systematic    |      arrival time      |  clock or timer 
      time-based    |     sampling pattern   | 
     ---------------+------------------------+--------------------- 
        random      |  packet position       |  packet counter, 
      n-out-of-N    |  sampling pattern      |  random numbers 
                    | (random number list)   | 
     ---------------+------------------------+--------------------- 
      uniform       |        sampling        |  random function 
      probabilistic |      probability       |    
     ---------------+------------------------+---------------------      
      non-uniform   |e.g. packet position    |  selection function, 
      probabilistic |or packet content(parts)|  probability calc. 
     ---------------+------------------------+--------------------- 
      non-uniform   |e.g. flow state         |  selection function, 
      flow-state    |or packet content(parts)|  probability calc. 
     ---------------+------------------------+--------------------- 
      mask/match    | packet content(parts)  |  filter function 
     ---------------+------------------------+--------------------- 
      hash-based    |  packet content(parts) |  hash function 
     ---------------+------------------------+--------------------- 
      router state  |    router state        |   router state 
                    |                        |   discovery 
     ---------------+------------------------+---------------------       
 
   
5.1 Information Model for Sampling Techniques 
   
  In this section we define the information models for most common 
  sampling techniques. Here the selection function is pre-defined and 
  given by the selector ID.  
   
  Sampler Description: 
       SELECTOR_ID 
       SELECTOR_TYPE 
       SELECTOR_PARAMETERS 
       OPERATING_TIME 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 11] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
       ASSOCIATIONS 
 
  Where: 
   
  SELECTOR_ID: 
  Unique ID for the packet sampler. The ID can be calculated under 
  consideration of the ASSOCIATIONS and a local ID. 
 
   
  SELECTOR_TYPE 
  Description: For sampling processes the SELECTOR TYPE defines what 
  sampling algorithm is used. 
  Values: n out of N | Systematic Time Based (equally spaced)| 
  Systematic Position Based (equally spaced)| Probabilistic | flow 
  state 
   
  [Remark: further sampling schemes will be added here] 
   
  SELECTOR_PARAMETERS 
  Description: For sampling processes the SELECTOR PARAMETERS define 
  the input parameters for the process. Interval length in systematic 
  sampling means, that all packets that arrive in this interval are 
  selected. The spacing parameter defines the spacing in time or number 
  of packets between the end of one sampling interval and the start of 
  the next succeeding interval. 
 
  Case n out of N: 
     - List of n sampling positions in an array of N positions 
   
  Case Systematic Time Based: 
     - Interval length (in usec), Spacing (in usec) 
   
  Case Systematic Count Based: 
     - Interval length(in packets), Spacing (in packets) 
   
  Case uniform Probabilistic(with equal probability per packet): 
     - Sampling probability p 
      
  Case non-uniform probabilistic: 
     - Calculation function for sampling probability p 
   
  Case flow state: 
     - Policy for selecting flows (e.g. give priority to large flows) 
   
      
  OPERATING_TIME 
  Description: The OPERATING_TIME parameter describes the start/stop 
  time of sampling process. List elements must not overlap. The start 
  time of the first element can be omitted, meaning ôfrom nowö. The end 
  time of the last element can be omitted, meaning ôuntil sampler is 
  removedö. 
  Values: List of (Start time, End time) 
   
  ASSOCIATIONS 
  Description: The ASSOCIATIONS field describes the observation point 
  and the IPFIX processes to which the packet selector is associated.  
  The STREAM ID denotes the origin of the data stream that is input to 
  the selection function. It can be the observation point directly or 
  the ID of another selector. With this it is possible to define 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 12] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  combined schemes. If the STREAM ID contains IDs from other selectors, 
  one can derive the original observation point from the selector 
  definitions of these specified selectors. 
   
  Values: <STREAM ID, Metering process ID, Exporting process ID> 
  With STREAM ID: Observation point ID | List of SELECTOR_IDs 
   
      
5.2 Information Model for Filtering Techniques 
   
  In this section we define the information models for most common 
  filtering techniques. The information model structure closely 
  parallels the one presented for the sampling techniques. 
   
  Filter Description: 
       SELECTOR_ID 
       SELECTOR_TYPE 
       SELECTOR_PARAMETERS 
       OPERATING_TIME 
       ASSOCIATIONS 
 
  Where: 
   
  SELECTOR_ID: 
  Unique ID for the packet filter. The ID can be calculated under 
  consideration of the ASSOCIATIONS and a local ID. 
   
  SELECTOR_TYPE 
  Description: For filtering processes the SELECTOR TYPE defines what 
  filtering type is used. 
  Values: Matching | Hashing | Router_state 
   
  SELECTOR_PARAMETERS 
  Description: For filtering processes the SELECTOR PARAMETERS define 
  formally the common property of the packet being filtered. For the 
  filters of type Matching and Hashing the definitions have a lot of 
  points in common. 
  Values: 
  Case Matching 
     - <Header type = ip v4 > 
     - <bit specification, header part> 
     - <Selection interval specification, header part> 
     - <Header type = ipv6> 
     - <bit specification, header part> 
     - <Selection interval specification, header part> 
     - <payload byte number N> 
     - <bit specification, payload part> 
     - <Selection interval specification, payload part> 
   
  Notes to Case Matching: 
   
     - The filter can be defined for the header part only, for the 
        payload part only or for both. In the latter case the matching 
        must be an AND of the two. 
     - The bit specification, for the header part, can be specified 
        for ipv4 or ipv6 only, or both 
     - In case of ipv4, the bit specification is a sequence of 20 
        Hexadecimal numbers [00,FF] specifying a 20 bytes bitmask to be 
        applied to the header 

 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 13] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
     - In case of ipv6, it is a sequence of 40 Hexadecimal numbers 
        [00,FF] specifying a 40 bytes bitmask to be applied to the 
        header 
     - The bit specification, for the payload part, is a sequence of 
        Hexadecimal numbers [00,FF] specifying the bitmask to be 
        applied to the first N bytes of the payload, as specified by 
        the previous field. In case the Hexadecimal number sequence is 
        longer then N, only the first N numbers are considered. 
     - In case the payload is shorter than N, the packet will not 
        match the filter Other options, like padding with zeros, may be 
        considered in the future. 
     - The selection interval specification is a list of non 
        overlapping intervals [intv_begin, intv_end] where intv_begin, 
        intv_end are bit strings of length 20*8 (ipv4 case), 40*8 (ipv6 
        case), N*8 (payload case). 
     - A filter cannot be defined on the options field of the ipv4 
        header, neither on stacked headers of ipv6. 
     - This specification doesnÆt preclude the future definition of a 
        high level syntax for defining in a concise way bit selection 
        and matching rules in a more human readable form (e.g. ôTCP 
        port in [2000,3000]ö). The requirement is that such a syntax 
        can be univoquely compiled into the one defined above 
   
  Case Hashing: 
     - <Header type = ipv4> 
     - <Input bit specification, header part> 
     - <Header type =  ipv6> 
     - <Input bit specification, header part> 
     - <payload byte number N> 
     - <Input bit specification, payload part> 
     - <additional hash input bits (seed)> 
     - Hashing function specification (includes length of hash 
        function output M) 
     - Selection interval specification, as a list of non overlapping 
        intervals [start value, end value] where value is in [0,2^M-1] 
   
  Notes to Case Hashing: 
   
     - On Input bit specifications fields, the same notes on bit 
        specifications of the Matching case reported above apply 
   
  Case Router State: 
 
     - Ingress interface at which packet arrives equals a specified 
        value 
     - Egress interface to which packet is routed to equals a 
        specified value 
     - Packet violated acl on the router 
     - Failed rpf  
     - Failed rsvp 
     - No route found for the packet 
     - Origin AS equals a specified value or lies within a given  
        range 
     - Destination AS equals a specified value or lies within a given 
        range 
 
  Note to Case Router State: 
     - All Router state entries can be linked by AND, OR, NOT 
        operators 
      
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 14] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  OPERATING_TIME 
  Description: The OPERATING_TIME parameter describes the start/stop 
  time of filtering process. List elements must not overlap. The start 
  time of the first element can be omitted, meaning ôfrom nowö. The end 
  time of the last element can be omitted, meaning ôuntil sampler is 
  removedö. 
  Values: List of (Start time, End time) 
   
  ASSOCIATIONS 
  Description: The ASSOCIATIONS field describes the observation point 
  and the IPFIX processes to which the packet selector is associated. 
  The STREAM ID denotes the origin of the data stream that is input to 
  the selection function. It can be the observation point directly or 
  the ID of another selector. With this it is possible to define 
  combined schemes. If the STREAM ID contains IDs from other selectors, 
  one can derive the original observation point from the selector 
  definitions of these specified selectors. 
   
  Values: STREAM ID, Metering process ID, Exporting process ID> 
  With STREAM ID: Observation point ID | List of SELECTOR_IDs 
   
6. Composite Techniques  
   
  Composite schemes are realized by using the STREAM ID in the 
  information models. The STREAM ID denotes from which selectors the 
  input stream originates. If multiple stream IDs are given, this means 
  that the selector operates on the packet stream simply resulting from 
  the time superposition of the output of all the listed filters and 
  samplers. Note that a sampler/filter could be intermittently active, 
  as defined in the OPERATING TIME field. 
  Some examples of composite schemes are reported below. 
   
6.1 Cascaded filtering->sampling or sampling->filtering 
 
  If a filter precedes a sampling process the role of filtering is to 
  create a set of ôparent populationsö from a single stream that can 
  then be fed independently to different sampling functions, with 
  different parameters tuned for the population itself (e.g. if streams 
  of different intensity result from filtering, it may be good to have 
  different sampling rates). If filtering follows a sampling process, 
  the same sampling rate and type is applied to the whole stream, 
  independently of the relative size of the streams resulting from the 
  filtering function. Moreover, also packets not destined to be 
  selected will ôloadö the sampling function. So, in principle, 
  filtering before sampling allows a more accurate tuning of the 
  sampling procedure, but if filters are too complex to work at full 
  line rate (e.g. because they have to access router state 
  information), sampling before filtering may be a need. 
   
6.2 Stratified Sampling  
   
  Stratified sampling is one example for using a composite technique. 
  The basic idea behind stratified sampling is to increase the 
  estimation accuracy by using a-priori information. The a-priori 
  information is used to perform an intelligent grouping of the 
  elements of the parent population. With this a higher estimation 
  accuracy can be achieved with the same sample size. 
   
  Stratified sampling divides the sampling process into multiple steps. 
  First, the elements of the parent population are grouped into subsets 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 15] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  in accordance to a given characteristic. This grouping can be done in 
  multiple steps. Then samples are taken from each subset.  
   
  The stronger the correlation between the characteristic used to 
  divide the parent population and the characteristic of interest (for 
  which an estimate is sought after), the easier is the consecutive 
  sampling process and the higher is the stratification gain. For 
  instance if the dividing characteristic were equal to the 
  investigated characteristic, each element of the sub-group would be a 
  perfect representative of that characteristic. In this case it would 
  be sufficient to take one arbitrary element out of each subgroup to 
  get the actual distribution of the characteristic in the parent 
  population. Therefore stratified sampling can reduce the costs for 
  the sampling process (i.e. the number of samples needed to achieve a 
  given level of confidence). 
 
  For stratified sampling one has to specify classification rules for 
  grouping the elements into subgroups and the sampling scheme that is 
  used within the subgroups. The classification rules can be expressed 
  by multiple filters. For the sampling scheme within the subgroups the 
  parameters have to be specified as described above. 
   
7. Security Considerations 
 
  Security threats can occur if the configuration of sampling 
  parameters or the communication of sampling parameters to the 
  application is corrupted. This document only describes sampling 
  schemes that can be used for packet selection. It neither describes a 
  mechanism how those parameters are configured nor how these 
  parameters are communicated to the application. Therefore the 
  security threats that originate from this kind of communication 
  cannot be assessed with the information given in this document.  
   
  In some cases malicious users or attackers may be interested to hide 
  packets from the service provider. For instance if packet selectors 
  are used for accounting or intrusion detection applications, users 
  may want to prevent that packets are selected. If a deterministic 
  sampling scheme is used or a selection scheme that takes packet 
  content into account, the user can shape or send packets in a way 
  that they are less likely to be selected. This has to be taken into 
  account when choosing an appropriate packet selection technique. 
 
 
8. References 
   
  [AmCa89]    Paul D. Amer, Lillian N. Cassel: Management of Sampled 
               Real-Time Network Measurements, 14th Conference on Local 
               Computer Networks, October 1989, Minneapolis, pages 62-
               68, IEEE, 1989 
   
  [ClPB93]    K.C. Claffy, George C Polyzos, Hans-Werner Braun: 
               Application of Sampling Methodologies to Network Traffic 
               Characterization, Proceedings of ACM SIGCOMM'93, San 
               Francisco, CA, USA, September 13 - 17, 1993 
   
  [CoGi98]    I. Cozzani, S. Giordano: Traffic Sampling Methods for 
               end-to-end QoS Evaluation in Large Heterogeneous 
               Networks. Computer Networks and ISDN Systems, 30 (16-
               18), September 1998. 
   
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 16] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  [DuGG03]    Nick Duffield, Albert Greenberg, Matthias Grossglauser, 
               Jennifer Rexford: A Framework for Passive Packet 
               Measurement, Internet Draft draft-ietf-psamp-framework-
               01, work in progress, June 2003                      
   
  [DuGr00]    Nick Duffield, Matthias Grossglauser: Trajectory 
               Sampling for Direct Traffic Observation, Proceedings of 
               ACM SIGCOMM 2000, Stockholm, Sweden, August 28 - 
               September 1, 2000. 
   
  [EsVa01]    C. Estan and G. Varghese, ôNew Directions in Traffic 
               Measurement and Accountingö, ACM SIGCOMM Internet 
               Measurement Workshop 2001, San Francisco (CA) Nov. 2001 
   
  [HT52]      D.G. Horvitz and D.J. Thompson, A Generalization of 
               Sampling   without replacement from a Finite Universe. 
               J. Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952. 
 
  [JePP92]    Jonathan Jedwab, Peter Phaal, Bob Pinna: Traffic 
               Estimation for the Largest Sources on a Network, Using 
               Packet Sampling with Limited Storage, HP technical 
               report, Managemenr, Mathematics and Security Department, 
               HP Laboratories, Bristol, March 1992, 
               http://www.hpl.hp.com/techreports/92/HPL-92-35.html 
   
  [Knuth98]   Donald E. Knuth: The Art of Computer Programming, Volume 
               3: Searching and Sorting, Addison Wesley, 1998  
                 
  [QuZC03]    J. Quittek, T. Zseby, B. Claise, S. Zander: Requirements 
               for IP Flow Information Export, Internet Draft <draft-
               ietf-ipfix-reqs-10.txt>, work in progress, June 2002 
   
  [Zseb02]    Tanja Zseby: Deployment of Sampling Methods for SLA 
               Validation with Non-Intrusive Measurements, Proceedings 
               of Passive and Active Measurement Workshop (PAM 2002), 
               Fort Collins, CO, USA, March 25-26, 2002  
   
9. Author's Addresses 
   
  Tanja Zseby 
  Fraunhofer Institute for Open Communication Systems 
  Kaiserin-Augusta-Allee 31 
  10589 Berlin 
  Germany 
  Phone: +49-30-34 63 7153 
  Fax:   +49-30-34 53 8153 
  Email: zseby@fokus.fhg.de 
   
  Maurizio Molina 
  NEC Europe Ltd., Network Laboratories 
  Adenauerplatz 6 
  69115 Heidelberg 
  Germany 
  Phone: +49 6221 90511-18  
  Email: molina@ccrle.nec.de 
 
  Fredric Raspall 
  NEC Europe Ltd., Network Laboratories 
  Adenauerplatz 6 
  69115 Heidelberg 
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 17] 
 

Internet Draft  Techniques for IP Packet Selection    June 2003 
 
 
  Germany 
  Phone: +49 6221 90511-31  
  EMail: raspall@ccrle.nec.de 
   
  Nick Duffield 
  AT&T Labs - Research 
  Room B-139 
  180 Park Ave 
  Florham Park NJ 07932, USA 
  Phone: +1 973-360-8726 
  Email: duffield@research.att.com 
   
10. 
   Intellectual Property Statement 
 
  AT&T Corporation may own intellectual property applicable to this 
  contribution. The IETF has been notified of AT&T's licensing intent 
  for the specification contained in this document. See 
  http://www.ietf.org/ietf/IPR/ATT-GENERAL.txt for AT&T's IPR 
  statement. 
   
11. 
   Full Copyright Statement 
 
  Copyright (C) The Internet Society (2002). All Rights Reserved. This 
  document and translations of it may be copied and furnished to 
  others, and derivative works that comment on or otherwise explain it 
  or assist in its implementation may be prepared, copied, published 
  and distributed, in whole or in part, without restriction of any 
  kind, provided that the above copyright notice and this paragraph are 
  included on all such copies and derivative works. However, this 
  document itself may not be modified in any way, such as by removing 
  the copyright notice or references to the Internet Society or other 
  Internet organizations, except as needed for the purpose of 
  developing Internet standards in which case the procedures for 
  copyrights defined in the Internet Standards process must be 
  followed, or as required to translate it into languages other than 
  English.  
   
  The limited permissions granted above are perpetual and will not be 
  revoked by the Internet Society or its successors or assigns. 
   
  This document and the information contained herein is provided on an 
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
   
 
Zseby, Molina, Raspall, Duffield   Expires December 2003     [Page 18]