IPFIX Working Group L. Peluso Internet-Draft T. Zseby Intended status: Informational Fraunhofer Institute FOKUS Expires: December 28, 2007 S. D'Antonio CINI Consortium/ITeM Laboratory June 26, 2007 Flow selection Techniques draft-peluso-flowselection-tech-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 28, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract Flow selection is the process in charge of electing a limited number of flows from all of those accounted at an observation point to be considered into the measurement process chain. The flow selection process can be enabled at different stages of the monitoring reference model by directly acting on the metering process after that packet classification is performed, i.e. flow state dependent packet Peluso, et al. Expires December 28, 2007 [Page 1] Internet-Draft Flow selection Techniques June 2007 sampling, or on the exporting process by limiting the number of flows to be stored and/or exported to the collector applications. This document describes the motivations which might lead flow selection to be performed and a categorization of the related techniques. The document furthermore provides the basis for the definition of information models for configuring flow selection techniques. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1. General terminology . . . . . . . . . . . . . . . . . . . 3 3.2. Selection process related terminology . . . . . . . . . . 6 4. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Flow selection techniques . . . . . . . . . . . . . . . . . . 7 5.1. Flow selection on flow record content . . . . . . . . . . 10 5.2. Flow selection on flow record arrival time . . . . . . . . 11 5.3. Flow selection on external events . . . . . . . . . . . . 11 6. Solutions for flow cache data structure . . . . . . . . . . . 11 7. Information model for flow selection configuration . . . . . . 11 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 9. Security Considerations . . . . . . . . . . . . . . . . . . . 12 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 11.1. Normative References . . . . . . . . . . . . . . . . . . . 12 11.2. Informative References . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Intellectual Property and Copyright Statements . . . . . . . . . . 15 Peluso, et al. Expires December 28, 2007 [Page 2] Internet-Draft Flow selection Techniques June 2007 1. Introduction 2. Scope The main aim of this document is to describe and analyse the flow selection that can be performed inside an IPFIX device. This document does not intend to deal with the flow selection that might result from the sampling of packets in the metering process before that the classification process is performed. Although that approach leads to a natural selection of the flows generated after the classification process, packet sampling techniques are widely analysed in [PSAMP-TECH] and, therefore, outside the scope of this document. Instead, it describes those selection techniques that might be considered in order to enable flow selection by directly acting at flow level within the metering process and/or the exporting process. 3. Terminology The terminology used here is fully consistent with all terms listed in [IPFIX-ARCH] and [PSAMP-TECH] and includes additional terms required for the description of flow selection techniques. For the sake of clarity, the definitions of the terms here used are below reproposed. 3.1. General terminology * Observation Point An Observation Point is a location in the network where IP packets can be observed. Examples include: (i) a line to which a probe is attached; (ii) shared medium, such as an Ethernet-based LAN; (iii) a single port of a router, or a set of interfaces (physical or logical) of a router; (iv) an embedded measurement subsystem within an interface. Note that every Observation Point is associated with an Observation Domain, and that one Observation Point may be a superset of several other Observation Points. For example one Peluso, et al. Expires December 28, 2007 [Page 3] Internet-Draft Flow selection Techniques June 2007 Observation Point can be an entire line card. That would be the superset of the individual Observation Points at the line card's interfaces. * Observation Domain An Observation Domain is the largest set of Observation Points for which Flow information can be aggregated by a Metering Process. For example, a router line card may be an observation domain if it is composed of several interfaces, each of which is an Observation Point. Each Observation Domain presents itself to the Collecting Process using an Observation Domain ID to identify the IPFIX Messages it generates. Every Observation Point is associated with an Observation Domain. It is recommended that Observation Domain IDs are also unique per IPFIX Device. * Observed Packet Stream The Observed Packet Stream is the set of all packets observed at the Observation Point. * IP Traffic Flow or Flow There are several definitions of the term 'flow' being used by the Internet community. Within the context of IPFIX we use the following definition: A Flow is defined as a set of IP packets passing an Observation Point in the network during a certain time interval. All packets belonging to a particular Flow have a set of common properties. Each property is defined as the result of applying a function to the values of: 1. One or more packet header fields (e.g. destination IP address), transport header fields (e.g. destination port number), or application header field; 2. One or more characteristics of the packet itself (e.g. number of MPLS labels); 3. One or more fields derived from packet treatment (e.g. next hop IP address, output interface). A packet is said to belong to a Flow if it completely satisfies all the defined properties of the Flow. This definition covers the range from a Flow containing all packets observed at a network interface to a Flow consisting of just a single packet between two applications. It includes packets selected by a sampling mechanism. Peluso, et al. Expires December 28, 2007 [Page 4] Internet-Draft Flow selection Techniques June 2007 * Flow Key Each of the fields which 1. Belong to the packet header (e.g. destination IP address); 2. Are a property of the packet itself (e.g. packet length); 3. Are derived from packet treatment (e.g. AS number) and which are used to define a Flow are termed Flow Keys. * Flow Record A Flow Record contains information about a specific Flow that was observed at an Observation Point. A Flow Record contains measured properties of the Flow (e.g. the total number of bytes for all the Flow's packets) and usually characteristic properties of the Flow (e.g. source IP address). * Metering Process The Metering Process generates Flow Records. Inputs to the process are packet headers and characteristics observed at an Observation Point, and packet treatment at the Observation Point (for example the selected output interface). The Metering Process consists of a set of functions that includes packet header capturing, timestamping, sampling, classifying, and maintaining Flow Records. The maintenance of Flow Records may include creating new records, updating existing ones, computing Flow statistics, deriving further Flow properties, detecting Flow expiration, passing Flow Records to the Exporting Process, and deleting Flow Records. * Exporting Process An Exporting Process sends Flow Records to one or more Collecting Processes. The Flow Records are generated by one or more Metering Processes. * Exporter A device which hosts one or more Exporting Processes is termed an Exporter. * IPFIX Device Peluso, et al. Expires December 28, 2007 [Page 5] Internet-Draft Flow selection Techniques June 2007 An IPFIX Device hosts at least one Exporting Process. It may host further Exporting processes and arbitrary numbers of Observation Points and Metering Process. * Collecting Process A Collecting Process receives Flow Records from one or more Exporting Processes. The Collecting Process might process or store received Flow Records, but such actions are out of scope for this document. * Collector A device which hosts one or more Collecting Processes is termed a Collector. 3.2. Selection process related terminology In this section, some additional terms are presented which extend the terminology introduced in [PSAMP-TECH]. * Flow Selection Process A Flow Selection Process takes the set of the accounted Flow Records as its input and selects a subset of that set as its output. * Flow Selection State A Flow Selection Process may maintain state information for use by the Flow Selection Process. At a given time, the Flow Selection State may depend on flows observed at and before that time, and other variables. Examples include: (i) number of accounted flow records; (ii) number of available rooms for flow recording; (iii) state of the pseudorandom number generators; (iv) hash values calculated during selection. * Flow Selector A Flow Selector defines the action of a Flow Selection Process on a single flow of its input. The Flow Selector can make use of the following information in determining whether a flow is selected: Peluso, et al. Expires December 28, 2007 [Page 6] Internet-Draft Flow selection Techniques June 2007 (i) the content of the flow record; (ii) any information state related to the flow recording; (iii) any selection state that may be maintained by the Flow Selection Process. 4. Motivation As stated in [PSAMP-TECH], packet selection is in charge of electing a representative subset of packets that allow accurate estimates of properties of the unsampled traffic to be formed. Its main application consists in performing some forms of data reduction on observed Internet traffic in order to limit the processing overhead at measurement devices. Despite its proven ability in achieving this objective, the mechanism responsible for steering the selection process is generally driven by a packet-based decision strategy. It means that, the basis element on which this selection mechanism is performed is a packet and mainly the decision of which packets are suitable to be elected somehow depends on packets themselves. As a consequence, depending on the specific adopted selection strategy, packet selection may not take in consideration eventual impacts of its actions on subsequent measurement components, such as flow recording and exporting processes, which are instead based on a higher-level representation, i.e. flows rather than packets. Under this perspective, flow selection differs from packet selection in the way that the basis elements on which the selection process is applied is not a packet but a flow. For IPFIX this would be flow records. In many networks the distribution of the number of packets per flow or the number of bytes per flow are heavy-tailed. That means, most flows consist only of a small number of packets and only few flows have a large number of packets. The few large flows contribute to the majority to the overall traffic volume [DuLT01a], [DuLT01b]. This observation on the flow size distributions in Internet traffic is also referred to as "Quasi-Zipf-Law" [KuXW04] or as "elephant and mice phenomenon". The large flows are referred to as elephant flows or heavy hitters. Nevertheless, such observations depend on the flow definition in use and can change with regard to the profile of future applications. For several applications it makes sense to select only the flows of interest. [more here]. 5. Flow selection techniques Figure 1 shows the IPFIX reference model as defined in [IPFIX-ARCH], and extends it in order to point out the functional components where flow selection can take place. As previously mentioned, flow Peluso, et al. Expires December 28, 2007 [Page 7] Internet-Draft Flow selection Techniques June 2007 selection can be provided at different stages of the measurement chain. One can act at packet level, within the metering process, and/or at flow level, by directly operating on the flow recording and/or exporting processes. Peluso, et al. Expires December 28, 2007 [Page 8] Internet-Draft Flow selection Techniques June 2007 Packet(s) coming in to Observation Point(s) | | v v +----------------+-------------------------+ +-----+-------+ | Metering Process on an | | | | Observation Point | | | | | | | | packet header capturing | | | | | |...| Metering | | timestamping | | Process N | | | | | | | packet selection | | | | | | | | | classification | | | | | | | | | flow state dependent sampling (*) | | | | | | | | | aggregation | | | | | | | | | flow recording (*) | | | | | | | | | | Timing out Flows | | | | | Handle resource overloads | | | +--------|---------------------------------+ +-----|-------+ | | Flow Records (selected by Observation Domain) Flow Records | | +----------------------+--------------------+ | +----------------------|---------------+ | Exporting Process v | | +---------------+-----------+ | | | flow selection (*) | | | +---------------+-----------+ | | | | | +---------------+-----------+ | | | flow export | | | +---------------+-----------+ | | | | +----------------------+---------------+ | v IPFIX export packet to Collector (*) indicates where flow selection can take place. Figure 1 Peluso, et al. Expires December 28, 2007 [Page 9] Internet-Draft Flow selection Techniques June 2007 As for the metering process, the flow selection consists in accounting only a subset of all the incoming packets collected at the observation point. However, unlike the selection process realized before the packet classification is performed, the flow selection at the metering process is in charge of electing only those incoming packets which somehow satisfy certain conditions related to the flows state information available from the flow recording process. This kind of selection is referred as a packet sampling technique, in accordance with [PSAMP-TECH] which introduces it as flow state dependent sampling. The state of the stored flow records is thus considered during the packet selection, so that the process responsible for generating or updating flow records might result easily influenced by selectively accounting the packets which feed it. Under this perspective, unlike the flow selection performed at the flow recording and exporting processes, flow selection operate at a very early stage to regard to the concept of flow, as it acts at packet level. In this way, in fact, one can prevent that some observed/observable packets might enforce the flow recording process to account, for instance, not representative or not expected flow records. Coming to the flow selection that might be provided in the flow recording and/or exporting processes, as above clarified, it is done at flow level, therefore, after that packets are classified in to the correspondent flows. More exactly, the flow selection process can be carried out on the flow recording process by storing new flow records only in those cases in which enough resources are available at the monitoring device to maintain them or by discarding already accounted records which, under certain circumstances and at a certain point in time, might be retained not anymore representative. Finally, at the flow exporting process it might be required that not all of the stored flow records available to be exported can be actually send to the collectors. We can distinguish the following selection techniques: 1. based on flow record content (i.e. all reported flow characteristics); 2. based on flow record arrival time; 3. based on external events like the exhaustion of local resources. 5.1. Flow selection on flow record content Peluso, et al. Expires December 28, 2007 [Page 10] Internet-Draft Flow selection Techniques June 2007 5.2. Flow selection on flow record arrival time 5.3. Flow selection on external events 6. Solutions for flow cache data structure The flow cache is the component of the flow monitoring system which in charge of storing flow records, i.e. the data structures devoted to contain values of predefined metrics related to every observed flow. The effectiveness of the flow cache definitely affects the overall performance of the flow monitoring system. This is the most challenging component, as it has to search for the flow records and update the related metrics within the packet interarrival time. Elements in the flow cache can be ordered according to a Least Recently Used (LRU) algorithm: as a packet arrives at the network interface it is classified, i.e. a flowID, is computed and assigned to it. Solutions for the generation of flow IDs and search mechanisms for flow records within flow cache are described in [MoCD06]. In case a corresponding flow record, i.e. a record with that flowID, already exists in the linked list, then it is updated with packet-related data and moved to the top of the list. Otherwise, a new flow record is created and inserted on top of the list. This ordering algorithm allows addressing two issues: first, timed out flows can be easily identified by scanning the list from the tail and checking for each record whether the difference between the last update time and the current time exceeds the timeout value. Second, it is intuitive that records related to living flows transporting a lot of traffic, the so-called elephant flows, are frequently moved to the head of the list. Therefore, data about such flows can be found with high probability by scanning the LRU list from the head. 7. Information model for flow selection configuration This section aims at describing the representative parameters of the above presented flow selection techniques. To this regard, this section provides the basis for an information model to adopt in order to configure the flow selection process at an IPFIX device. Peluso, et al. Expires December 28, 2007 [Page 11] Internet-Draft Flow selection Techniques June 2007 8. IANA Considerations This document makes no request of IANA. 9. Security Considerations 10. Acknowledgements 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 11.2. Informative References [DuLT01a] Duffield, N., Lund, C., and M. Thorup, "Charging from Sampled Network Usage", ACM Internet Measurement Workshop IMW 2001, San Francisco, USA, November 2001. [DuLT01b] Duffield, N., Lund, C., and M. Thorup, "Properties and Prediction of Flow Statistics from Sampled Packet Streams", ACM SIGCOMM Internet Measurement Workshop 2002, November 2002. [DuLT01c] Duffield, N., Lund, C., and M. Thorup, "Learn More, sample less: control of volume and variance in network measurement", IEEE Transactions on Information Theory, May 2005. [DuLT01d] Duffield, N., Lund, C., and M. Thorup, "Flow Sampling under Hard Resource Constraints", ACM IFIP Conference on Measurement and Modeling of Computer Systems SIGMETRICS, June 2004. [EsVa01] Estan, C. and G,. Varghese, "New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice", ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco (CA), November 2001. Peluso, et al. Expires December 28, 2007 [Page 12] Internet-Draft Flow selection Techniques June 2007 [FeGL98] Feldmann, A., Rexford, J., and R. Caceres, "Efficient Policies for Carrying Web Traffic over Flow-Switched Networks", IEEE/ACM Transaction on Networking, December 1998. [IPFIX-ARCH] Sadasivan, G., Bownlee, N., Claise, B., and J. Quittek, "Architecture for IP Flow Information Export", Internet Draft draft-ietf-ipfix-architecture-12.txt, work in progress, September 2006. [IPFIX-INFO] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", Internet Draft draft-ietf-ipfix-info-15.txt, work in progress, February 2007. [KuXW04] Kumar, K., Xu, J., Wang, J., Spatschek, O., and L. Li, "Space-code bloom filter for efficient per-flow traffic measurement", INFOCOM 2004 Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies, March 2004. [MoCD06] Molina, M., Chiosi, A., D'Antonio, S., and G. Ventre, "Design principles and algorithms for effective high-speed IP flow monitoring", September 2006. [Moli03a] Molina, M., "A scalable and efficient methodology for flow monitoring in the Internet", International Teletraffic Congress (ITC-18), Berlin, September 2003. [PSAMP-TECH] Zseby, T., Molina, M., Raspall, F., Duffield, N., and S. Niccolini, "Sampling and Filtering techniques for IP Packet Selection", Internet Draft draft-ietf-psamp-sample-tech-10.txt, work in progress, June 2007. Peluso, et al. Expires December 28, 2007 [Page 13] Internet-Draft Flow selection Techniques June 2007 Authors' Addresses Lorenzo Peluso Fraunhofer Institute FOKUS Kaiserin-Augusta-Allee 31 Berlin 10589 Germany Phone: +49 30 3463 7171 Email: lpeluso@fokus.fraunhofer.de Tanja Zseby Fraunhofer Institute FOKUS Kaiserin-Augusta-Allee 31 Berlin 10589 Germany Phone: +49 30 3463 7153 Email: zseby@fokus.fraunhofer.de Salvatore D'Antonio CINI Consortium/ITeM Laboratory Monte S.Angelo, Via Cinthia Napoli 80126 Italy Phone: +39 081 679944 Email: saldanto@unina.it Peluso, et al. Expires December 28, 2007 [Page 14] Internet-Draft Flow selection Techniques June 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Peluso, et al. Expires December 28, 2007 [Page 15]