IPFIX Working Group B. Trammell Internet-Draft CERT/NetSA Intended status: Standards Track E. Boschi Expires: September 3, 2007 Hitachi Europe L. Mark T. Zseby Fraunhofer FOKUS March 2, 2007 An IPFIX-Based File Format draft-trammell-ipfix-file-03.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 3, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This document describes a file format for the storage of flow data based upon the IPFIX message format. It proposes a set of requirements for flat-file, binary flow data file formats, evaluates flow storage systems presently in use for their conformance to these Trammell, et al. Expires September 3, 2007 [Page 1] Internet-Draft IPFIX Files March 2007 requirements, then applies the IPFIX message format to these requirements to build a new file format. This IPFIX-based file format is designed to facilitate interoperability and reusability among a wide variety of flow storage, processing, and analysis tools. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Record Format Flexibility . . . . . . . . . . . . . . . . 6 4.2. Self Description . . . . . . . . . . . . . . . . . . . . . 7 4.3. Data Compression . . . . . . . . . . . . . . . . . . . . . 7 4.4. Indexing and Searching . . . . . . . . . . . . . . . . . . 8 4.5. Data Integrity . . . . . . . . . . . . . . . . . . . . . . 8 4.6. Creator Authentication and Confidentiality . . . . . . . . 9 4.7. Anonymization and Obfuscation . . . . . . . . . . . . . . 9 4.8. Performance Characteristics . . . . . . . . . . . . . . . 10 5. Survey of Existing Flow and Trace File Formats . . . . . . . . 10 5.1. NetFlow V5/V7 . . . . . . . . . . . . . . . . . . . . . . 10 5.2. Argus 2 . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3. SiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.4. libpcap dumpfile . . . . . . . . . . . . . . . . . . . . . 12 6. IPFIX File Format Description . . . . . . . . . . . . . . . . 12 6.1. Recommended Information Elements for IPFIX Files . . . . . 14 6.1.1. informationElementId . . . . . . . . . . . . . . . . . 14 6.1.2. informationElementAnonymizationType . . . . . . . . . 15 6.1.3. informationElementSemanticType . . . . . . . . . . . . 15 6.1.4. informationElementStorageType . . . . . . . . . . . . 16 6.1.5. messageMD5Checksum . . . . . . . . . . . . . . . . . . 17 6.1.6. messageScope . . . . . . . . . . . . . . . . . . . . . 17 6.1.7. privateEnterpriseNumber . . . . . . . . . . . . . . . 17 6.2. Recommended Options Templates for IPFIX Files . . . . . . 18 6.2.1. Information Element Semantics Options Template . . . . 18 6.2.2. Message Checksum Options Template . . . . . . . . . . 19 6.2.3. Template Anonymization Options Template . . . . . . . 20 6.3. Recommended Compression Strategy for File Writers . . . . 21 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8. Security Considerations . . . . . . . . . . . . . . . . . . . 22 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 10. Open Issues and Notes . . . . . . . . . . . . . . . . . . . . 22 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 12.1. Normative References . . . . . . . . . . . . . . . . . . . 23 12.2. Informative References . . . . . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 Trammell, et al. Expires September 3, 2007 [Page 2] Internet-Draft IPFIX Files March 2007 Intellectual Property and Copyright Statements . . . . . . . . . . 26 Trammell, et al. Expires September 3, 2007 [Page 3] Internet-Draft IPFIX Files March 2007 1. Introduction This document proposes a file format based upon IPFIX. It begins by exploring the motivation for proposing a standardized flow file format, and using IPFIX as the basis for this new file format. It then proposes a set of requirements for this file format, evaluates existing flow storage file formats for their conformance to these requirements, and describes either how the IPFIX message format meets each requirement, or how a file format based upon it could meet the requirement. It closes by proposing an initial specification of the new file format and providing examples of IPFIX Files meeting this specification. This format makes use of the IPFIX Options mechanism for additional file metadata, in order to avoid requiring any protocol or message format extensions. 2. Terminology Terms used in this document that are defined in the Terminology section of the IPFIX Protocol [I-D.ietf-ipfix-protocol] document are to be interpreted as defined there. IPFIX File: An IPFIX File is a serialized stream of IPFIX Messages stored on a filesystem. Any IPFIX Message stream that would be considered valid when transported one or more of the specified IPFIX transports (SCTP, TCP, or UDP) as defined in the IPFIX Protocol draft [I-D.ietf-ipfix-protocol] is considered an IPFIX File for purposes of this draft; however, this draft further restricts that definition with recommendations on the construction of IPFIX Files that meet the requirements identified herein. IPFIX File Reader: An IPFIX File Reader is a Process which reads IPFIX Files from a filesystem, and is analogous to an IPFIX Collecting Process. An IPFIX File Reader MUST behave as an IPFIX Collecting Process as outlined in the IPFIX Protocol draft [I-D.ietf-ipfix-protocol], except as modified by this document. IPFIX File Writer: An IPFIX File Writer is a process which writes IPFIX Files to a filesystem, and is analogous to an IPFIX Exporting Process. An IPFIX File Writer MUST behave as an IPFIX Exporting Process as outlined in the IPFIX Protocol draft [I-D.ietf-ipfix-protocol], except as modified by this document. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Trammell, et al. Expires September 3, 2007 [Page 4] Internet-Draft IPFIX Files March 2007 3. Motivation There are a wide variety of applications for the file-based storage of IP flow data, across a continuum of time scales. Tools used in the analysis of flow data and creation of analysis products often use files as a convenient unit of work, with an ephemeral lifetime. A set of flows relevant to a security investigation may be stored in a file for the duration of that investigation, and futher exchanged among incident handlers via email or within an external incident handling workflow application. Sets of flow data relevant to Internet measurement research may be published as files, much as libpcap packet trace files are, to provide common data sets for the repeatability of research efforts; these files would have lifetimes measured in months or years. Operational flow measurement systems also have a need for long-term, archival storage of flow data, either as a primary flow data repository, or as a backing tier for online storage in a relational database management system (RDBMS). The variety of applications of flow data, and the variety of presently deployed storage approaches, would seem to indicate the need for a standard approach to flow storage with applicability across the continuum of time scales over which flow data is stored. A storage format based around flat files would best address the variety of storage requirements. While much work has been done on structured storage via RDBMS, relational database systems are not a good basis for format standardization owing to the fact that their internal data structures are generally private to a single implementation and subject to change for internal reasons. Also, there are a wide variety of operations available on flat files, and external tools and standards can be leveraged to meet file-based flow storage requiremenets. Further, flow data is often not very semantically complicated, is managed in very high volume, and therefore an RDBMS-based flow storage system would not benefit much from the advantages of relational database technology. The simplest way to create a new file format is simply to serialize some internal data model to disk, with either textual or binary representation of data elements, and some framing strategy for delimiting fields and records. "Ad-hoc" file formats such as this have several important disadvantages. One, they impose the semantics of the data model from which they are derived on the file format; as such, they are difficult to extend, describe, and standardize. The emergence over the past decade of XML as a new "universal" framing format for flat as well as hierarchical data addresses these concerns; however, XML is not necessarily ideal for a storage format for flow data. First, flow data, being inherently simple and record- oriented, does not benefit from the more advanced semantics available Trammell, et al. Expires September 3, 2007 [Page 5] Internet-Draft IPFIX Files March 2007 with XML. There is not much to be gained by describing each record individually when the records all have the same format, or one of a small set of formats. Second, XML processing introduces potentially significant overhead. While an XML stream should in theory be approximately as compressible as any other stream representation, the additional compression/decompression and generation/parsing of XML data is not worth the benefit in this case. This leads us to propose the IPFIX message format as the basis for a new flow data file format. The IPFIX working group, in defining the IPFIX protocol, has already defined an information model and data formatting rules for representation of flow data. Especially at shorter time scales, when a file is a unit of data interchange, the filesystem may be viewed as simply another IPFIX message transport between processes. This format is especially well suited to representing flow data, as it was designed specifically for flow data export; it is easily extensible unlike ad-hoc serialization, and compact unlike XML. In addition, IPFIX is an emerging standard for the export and collection of flow data; using a common format for storage and analysis at the collection side allows implementors to use substantially the same information model and data formatting implementation for transport as well as storage. 4. Requirements In this section, we outline a proposed set of requirements for any persistent storage format for flow data. First and foremost, a flow data file format should support storage across the continuum of time scales important to flow storage applications. Each of the requirements enumerated in the sections below is broadly applicable to flow storage applications, though each may be more important at certain time scales. For each, we first identify the requirement, then explain how the IPFIX message format addresses it, or briefly outline the changes that must be made in order for an IPFIX-based file format to meet the requirement. 4.1. Record Format Flexibility Due to the wide variety of flow attributes collected by different network flow attribute measurement systems, the ideal flow storage format will not impose a single data model or a specific record type on the flows it stores. The file format must be flexible and extensible; that is, it must support multiple record types definable within the file itself, and must be able to support new field types for data within the records in a graceful way. IPFIX provides extensibility through the use of Templates to describe Trammell, et al. Expires September 3, 2007 [Page 6] Internet-Draft IPFIX Files March 2007 each Data Record, through the use of an IANA Registry to define its Information Elements, and through the use of enterprise-specific Information Elements. 4.2. Self Description Archived data may be read at a time in the future where any external reference to the meaning of the data may be lost. The ideal flow storage format should be self-describing; that is, a process reading flow data from storage should be able to properly interpret the stored flows without reference to anything other than standard sources (e.g., the standards document describing the file format) and the stored flow data itself. The IPFIX message format is partially self-describing; that is, IPFIX Templates containing only IANA-assigned Information Elements can be completely interpreted according to the IPFIX Information Model without additional external data. However, Templates containing private information elements lack detailed type and semantic information; a Collecting Process receiving data described by a template containing private Information Elements it does not understand can only treat the data contained within those Information Elements as octet arrays. To be fully self- describing, enterprise-specific Information Elements must be additionally described via IPFIX Options according to the Information Element Semantics Options Template defined below. 4.3. Data Compression Regardless of the representation format, flow data describing traffic on real networks tends to be highly compressible. Compression tends to improve the scalability of flow collection systems, by reducing the disk storage and I/O bandwidth requirement for a given workload. The ideal flow storage format should support applications which wish to leverage this fact by supporting compression of stored data. The IPFIX message format has no support for data compression, as the IPFIX protocol was designed for speed and simplicity of export. Of course, any flat file is readily compressible using a wide variety of external data compression tools, formats, and algorithms; therefore, this requirement can be met externally. However, a couple of simple optimizations can be made by File Writers to increase the integrity and usability of compressed IPFIX data; these are outlined in the Recommended Compression Strategy section, which appears below. Trammell, et al. Expires September 3, 2007 [Page 7] Internet-Draft IPFIX Files March 2007 4.4. Indexing and Searching Binary, record stream oriented file formats natively support only one form of searching, sequential scan in file order. By choosing the order of records in a file carefully (e.g., by time), a file can be indexed by a single key. Adding indexes to the file for additional keys can speed searches considerably. Archival storage systems which split large amounts of flow data across multiple files can benefit from a related technique. If each file is given a table of contents which notes the ranges or sets of specific keys present in the file (for example, timestamps or source addresses), searches across multiple files can omit files which contain no flow data of interest. The ideal flow storage format will support a method for noting that the records in a file are sorted by a certain key or set of keys, for noting that the records in a file contain data relating only to certain values of keys, and for providing index information for keys on which the file is not sorted. There is presently no support for indexing or sort order notation in the IPFIX message format. If internal indexing is required, it would need to be added to an IPFIX-based file format by extension. This revision of this draft does not address this requirement further, though it may be addressable without protocol or message format changes using IPFIX Options. 4.5. Data Integrity When storing flow data over long time scales, especially for archival purposes, it is important to ensure that hardware or software faults do not introduce errors into the data over time. The ideal flow storage format will support the detection and correction of encoding- level errors in the data. Note that more advanced error correction is almost certainly best handled at a layer below that addressed by this document. Error correction is a topic well addressed by the storage industry in general (e.g. by RAID and other technolgies), and by specifying a flow storage format based upon files, we can leverage these features to meet this requirement. However, the ideal flow storage format will be resilient against errors, providing an internal facility for the detection of errors and the ability to isolate errors to as few data records as possible. Note that this requirement interacts with the choice of data Trammell, et al. Expires September 3, 2007 [Page 8] Internet-Draft IPFIX Files March 2007 compression algorithm. The use of block compression algorithms can serve to isolate errors to a single compression block, unlike stream compressors, which may fail to resynchronize after a single bit error, invalidating the entire message stream. See the Recommended Compression Strategy below for more on this interaction. The IPFIX message format does not support data integrity assurance. It is assumed that advanced error correction will be provided externally. For simple error detection support, checksums may be attached to messages via IPFIX Options according to the Message Checksum Options Template defined below. 4.6. Creator Authentication and Confidentiality Storage of flow data across long time scales may also require assurance that no unauthorized entity can read or modify the stored data. Asymmetric-key cryptography can be applied to this problem, by signing flow data with the private key of the creator, and encrypting it with the public keys of those authorized to read it. The ideal flow storage format will support the encryption and signing of flow data. As with error correction, this problem has been addressed well at a layer below that addressed by this document. Instead of specifying a particular choice of encryption technology, we can leverage the fact that existing cryptographic technologies work quite well on data stored in files to meet this requirement. Beyond support for the use of TLS for transport over TCP or DTLS for transport over SCTP or UDP, both of which provide transient authentication and confidentiality, the IPFIX protocol does not support this requirement directly. It is assumed that this requirement will be met externally. 4.7. Anonymization and Obfuscation To ensure the privacy of individuals and organizations at the endpoints of communications represented by flow records, it is often necessary to obfuscate or anonymize stored and exported flow data. The ideal flow storage format will provide for a notation that a given information element on a given record type represents anonymized, rather than real, data. The IPFIX message format presently has no support for anonymization notation. It should be noted that anonymization is one of the requirements given for IPFIX in RFC 3917 [RFC3917]. The decision to qualify this requirement with 'MAY' and not 'MUST' in the requirements document, and its subsequent lack of specification in Trammell, et al. Expires September 3, 2007 [Page 9] Internet-Draft IPFIX Files March 2007 the current version of the IPFIX protocol, is due to the fact that anonymization algorithms are still a research issue, and that there currently exist no standardized methods for anonymization. Simple anonymization notation may be attached to templates via IPFIX Options according to the Template Anonymization Options Template defined below. 4.8. Performance Characteristics The ideal standard flow storage format will not have a significant negative impact on the performance of the application implementing it. This is a non-functional requirement, but it is important to note that a standard that implies a performance penalty is unlikely to be widely implemented and adopted. A static analysis of the IPFIX message format would seem to suggest that implementations of it are not particularly prone to slowness; indeed, a template-based data representation is more easily subject to optimization for common cases than representations that embed structural information directly in the data stream (e.g. XML). However, a full analysis of the impact of using IPFIX messages as a basis for flow data storage on read/write performance will require more implementation experience and performance measurement. 5. Survey of Existing Flow and Trace File Formats 5.1. NetFlow V5/V7 One de facto standard for the storage of flow data collected via Cisco NetFlow V5 or V7 is to serialize a stream of "raw" NetFlow datagrams into files. These NetFlow PDU files consist of a collection of header- prefixed blocks (corresponding to the datagrams as received on the wire) containing fixed-length binary flow records. NetFlow V5 and V7 data may be mixed within a given file, as the header on each datagram defines the NetFlow version of the records following; there is indeed very little difference between the two record formats. NetFlow V5/V7 PDU files are neither extensible nor self-describing; however, their status as a de facto standard means the definition of the data format is well-understood. Indexing, compression, error detection and correction, authentication, and confidentiality must be handled externally. Trammell, et al. Expires September 3, 2007 [Page 10] Internet-Draft IPFIX Files March 2007 5.2. Argus 2 QoSient's Argus (as of version 2.0.6) uses a file format based upon a stream of type-and-length prefixed records. There are two general types of records in this stream, management records and flow records. Management records export flow collection statistics, much like the recommended scoped data records in the IPFIX protocol. Flow records contain information about a single flow each, and are further typed based upon the protocol of the flow (e.g., IP, ICMP, ARP). The Argus file format natively spports bidirectional flow export, as each flow record contains both forward and reverse counters. The Argus tools support a transport protocol that simply encapsulates a record stream over a TCP connection. Transport is collector- initiated; that is, a collector establishes a connection to an exporter in order to read a record stream. Argus files are not self-describing; that is, only the Argus tools themselves encapsulate the definition of each of the record types. The Argus file format is not extensible without changing the Argus implementation. Argus provides no indexing facility for its file format, though records are roughly sorted by record generation time. Compression, error correction, authentication, and confidentiality are handled externally to the format, and are available as with all files. There is no special support for data obfuscation in the format. 5.3. SiLK The CERT/NetSA SiLK tools use a set of fixed-length binary record formats. Each file is prefixed with a header which denotes which record format the file is stored in. These record formats are differentiated by the presence or absence of certain fields; in this way, each format identifier is essentially a short-hand identifier for a template describing the record. This also implies that only one type of record may be stored in any given file. As with Argus, SiLK files are not self-describing and are not extensible. SiLK provides no indexing facility, though files are generally stored in flow end time order; and when used for archival storage, information about sensors and flow times appearing in each file is stored in the file path name. Compression is handled internally to the file format, and allows the storage of compressed data in a file with uncompressed headers, and a guarantee of compression block boundary alignment with record boundaries. Error correction, authentication, and confidentiality can be handled externally. There is no special support for data obfuscation in the SiLK file format. Trammell, et al. Expires September 3, 2007 [Page 11] Internet-Draft IPFIX Files March 2007 5.4. libpcap dumpfile The libpcap dumpfile format is a packet trace format rather than a flow file format, so it does not address any of the requirements outlined above. However, it is used widely in a use case (data storage and distribution for network measurement research) similar to one addressed by the format proposed in this draft, so we include it here. libpcap dumpfiles consist of a file header containing information common to the whole file (most importantly, the datalink layer, for interpretation of the datalink headers on each frame), followed by a set of raw captured frame records each prefixed by a frame header containing timestamp and length information. The format is not particularly flexible or self-describing, nor does it need to be: undecoded frames are about as semantically simple as network traffic data can get. However, the simplicity and ubiquity of the libpcap dumpfile format has led to its becoming a de facto standard for the distribution of packet trace data for Internet measurement applications. We propose the file format described in this draft in part as an analogue to the libpcap dumpfile format for flow data. Note that libpcap dumpfiles could be used as a storage format for any unidirectional, datagram-oriented protocol such as IPFIX or NetFlow, simply by storing the captured export session. However, this has several important drawbacks. First, the additional per-packet headers provided by pcap are redundant in the case of IPFIX, as length and export time are already available in the IPFIX Message Header. Second, the link, network, and transport layer headers are stored in a dumpfile; these are not necessary for the successful interpretation of an IPFIX Message, and add additional decode overhead. Third, a file created by capturing an export session may require additional processing to reassemble fragmented datagrams in the message stream. 6. IPFIX File Format Description An IPFIX file, as defined by this draft and elaborated below, is at its core simply an IPFIX Message stream serialized to some filesystem. Any valid serialized IPFIX Message stream MUST be accepted by a File Reader as a valid IPFIX file. In this way, the filesystem is simply treated as another IPFIX Transport alongside SCTP, TCP, and UDP, although one with unusually high latency, as the File Reader and File Writer are not necessarily synchronized in time, unlike IPFIX Collecting and Exporting Processes. Trammell, et al. Expires September 3, 2007 [Page 12] Internet-Draft IPFIX Files March 2007 An IPFIX File Reader MUST accept as valid any IPFIX message stream that would be considered valid by one or more of the other defined IPFIX transport layers. Practically, this means that the union of template management features supported by SCTP, TCP, and UDP MUST be supported in IPFIX Files. The following requirements apply to IPFIX File Readers: o File Readers MUST accept IPFIX Messages containing Template Sets, Options Template Sets, and Data Sets within the same message, as with IPFIX over TCP or UDP. o File Readers MUST accept Template Sets that define templates already defined within the file, as may occur with template retransmission when using IPFIX over UDP as described in section 10.3.6 of the IPFIX Protocol draft [I-D.ietf-ipfix-protocol]. In the event of a conflict between a resent definition and a previous definition, the File Reader MUST assume that the new template replaces the old, as consistent with UDP template expiration and ID reuse. o File Readers MUST accept Template Withdrawals as described in section 8 of the IPFIX Protocol draft [I-D.ietf-ipfix-protocol], provided that the Template to be withdrawn is defined, as is the case with IPFIX over TCP and SCTP. However, for representation simplicity and read performance, File Writers SHOULD use the following template and scope management strategy: o File Writers SHOULD emit Template Sets and Options Template Sets to appear in the file before any Data Sets, to ensure all Templates are available before any data is read. o File Writers SHOULD emit Data Records described by Options Templates to appear in the file before any Data Records which depend on the scopes defined by those options. o File Writers SHOULD use Template Withdrawals to withdraw Templates if if template IDs need to be reused. In this case, the new Templates reusing those IDs SHOULD appear directly in the file after the Template Withdrawals making the IDs available for reuse. Practically speaking, this means an IPFIX File not requiring more than 65280 Templates (the non-reserved Template ID number space) SHOULD consist of Template Sets, followed by Options, followed by Data Sets, and contain no Template Withdrawals. For IPFIX Files stored for shorter time scales (i.e. transient Trammell, et al. Expires September 3, 2007 [Page 13] Internet-Draft IPFIX Files March 2007 storage for communication between tools) Transport Session SHOULD be synonymous with a single File. In other words, the beginning of a file SHOULD be interpreted by a File Reader as the beginning of a Transport Session, and the end of a file SHOULD be interpreted by a File Reader as the end of a Transport Session. This implies that Templates and Options are limited in scope to the single File in which thet are defined. However, depending on the application, File Readers and File Writers MAY be flexibile with respect to their definition of a Transport Session. A File Reader MAY be configurable to treat a collection of Files (e.g., all the files in a directory) as a single Transport Session, especially when used for archival purposes. By default, a File Reader SHOULD not make any assumptions about the ordering of records in an IPFIX File. File Writers MAY write records to a file in any order. 6.1. Recommended Information Elements for IPFIX Files The following information elements are used by the options templates below to allow IPFIX message streams to meet the requirements outlined above without extension to the message format or protocol. IPFIX File Readers and Writers SHOULD support these information elements as defined below. 6.1.1. informationElementId Description: An information element ID, as would appear in an IPFIX Template Record. This element can be used to scope properties to a specific information element within a Template. This IE should be encoded with the Enterprise ID bit set to 0, regardless of whether the Enterprise ID bit is set in the template to which this IE refers. See the definition of privateEnterpriseNumber below for more on the use of this IE to describe vendor-specific IEs. Abstract Data Type: unsigned16 Data Type Semantics: identifier ElementId: TBD Status: Proposed Reference: Section 3.4.1 of the IPFIX Protocol draft Trammell, et al. Expires September 3, 2007 [Page 14] Internet-Draft IPFIX Files March 2007 6.1.2. informationElementAnonymizationType Description: A description of the anonymization status of an IPFIX information element within a template. If this field is FALSE, the corresponding IE is not anonymized; to the best ability of the Exporting Process to determine, it represents a real value. If this field is TRUE, the corresponding IE is anonymized; to the best ability of the Exporting Process to determine, it represents a value that has been transformed to maintain privacy. Note that if no informationElementAnonymizationType is specified for an information element, it is assumed to be FALSE, or not anonymized. Abstract Data Type: boolean ElementId: TBD Status: Proposed 6.1.3. informationElementSemanticType Description: A description of the semantics of an IPFIX information element within a template. These correspond to the data type semantics defined in section 3.2 of the IPFIX Information Model [I-D.ietf-ipfix-info]; see that section for more information on the types described below. This field may take the following values; the special value 0x00 (none) is used to note that no semantics apply to the field; it cannot be manipulated by a Collecting Process or File Reader that does not understand it a priori. +-------+--------------+ | Value | Description | +-------+--------------+ | 0x00 | none | | 0x01 | quantity | | 0x02 | totalCounter | | 0x03 | deltaCounter | | 0x04 | identifier | | 0x05 | flags | +-------+--------------+ Abstract Data Type: octet ElementId: TBD Trammell, et al. Expires September 3, 2007 [Page 15] Internet-Draft IPFIX Files March 2007 Status: Proposed 6.1.4. informationElementStorageType Description: A description of the storage type of an IPFIX information element within a template. These correspond to the abstract data types defined in section 3.1 of the IPFIX Information Model [I-D.ietf-ipfix-info]; see that section for more information on the types described below. This field may take the following values: +-------+----------------------+ | Value | Description | +-------+----------------------+ | 0x00 | octetArray | | 0x01 | unsigned8 | | 0x02 | unsigned16 | | 0x03 | unsigned32 | | 0x04 | unsigned64 | | 0x05 | signed8 | | 0x06 | signed16 | | 0x07 | signed32 | | 0x08 | signed64 | | 0x09 | float32 | | 0x0A | float64 | | 0x0B | boolean | | 0x0C | macAddress | | 0x0D | string | | 0x0E | dateTimeSeconds | | 0x0F | dateTimeMilliseconds | | 0x10 | dateTimeMicroseconds | | 0x11 | dateTimeNanoseconds | | 0x12 | ipv4Address | | 0x13 | ipv6Address | +-------+----------------------+ Abstract Data Type: octet ElementId: TBD Status: Proposed Reference: Section 3.1 of the IPFIX Information Model Trammell, et al. Expires September 3, 2007 [Page 16] Internet-Draft IPFIX Files March 2007 6.1.5. messageMD5Checksum Description: The MD5 checksum of the IPFIX Message containing this record. This IE SHOULD be bound to its containing IPFIX Message via an options record and the messageScope IE, as defined below, and SHOULD appear only once in a given IPFIX Message. To calculate the value of this IE, first buffer the containing IPFIX Message, setting the value of this IE to all zeroes. Then caluclate the MD5 checksum of the resulting buffer as defined in RFC 1321 [RFC1321], place the resulting value in this IE, and export the buffered message. Abstract Data Type: octetArray (16 bytes) ElementId: TBD Status: Proposed Reference: RFC 1321, The MD5 Message-Digest Algorithm [RFC1321] 6.1.6. messageScope Description: The presence of this Information Element as scope in an Options Template signifies that the options described by the Template apply to the IPFIX Message that contains them. It is defined for general purpose message scoping of options, and proposed specifically to allow the attachment a checksum to a message via IPFIX Options. The value of this Information Element SHOULD be ignored by the File Reader or the Collecting Process. Abstract Data Type: octet ElementId: TBD Status: Proposed 6.1.7. privateEnterpriseNumber Description: A private enterprise number used to scope an information element ID, as would appear in an IPFIX Template Record. This element can be used to scope properties to a specific information element within a Template. If the Enterprise ID bit of the corresponding Information Element is cleared (has the value 0), this IE should be set to 0. The presence of a non- zero value in this IE implies that the Enterprise ID bit of the corresponding Information Element is set (has the value 1). Trammell, et al. Expires September 3, 2007 [Page 17] Internet-Draft IPFIX Files March 2007 Abstract Data Type: unsigned32 Data Type Semantics: identifier ElementId: TBD Status: Proposed Reference: Section 3.4.1 of the IPFIX Protocol draft 6.2. Recommended Options Templates for IPFIX Files The following Options Templates allow IPFIX message streams to meet the requirements outlined above without extension to the message format or protocol. They are defined in terms of existing Information Elements defined in the IPFIX Information Model [I-D.ietf-ipfix-info], as well as new Information Elements defined in the section above. IPFIX File Readers and Writers SHOULD support these options templates as defined below. 6.2.1. Information Element Semantics Options Template The Information Element Semantics Options Template specifies the structure of a Data Record for attaching semantic and storage type information to enterprise-specific Information Elements in specified Template Records. Data Records described by this Template SHOULD appear for each enterprise-specific Information Element used within a File. Collecting Processes and IPFIX File Readers can use options data described by this template to improve handling of unknown Information Elements. Information Element semantics records defined by this template MUST be handled by Collecting Processes and File Readers as scoped to the Transport Session in which they are sent; this facility is not intended to provide a method for the permanent definition of Information Elements. Similarly, for security reasons, storage and semantics types for a given Information Element MUST NOT be re-defined by Information Element semantics records. Once an Information Element semantics record has been exported for a given Information Element within a given Transport Session, all subsequent semantics records for that Information Element MUST be identical. If conflicting semantic or type information is received in multiple semantics records by a Collecting Process or File Reader, the Collecting Process MUST reset the Transport Session. Information Element semantics records MUST NOT be used to define Trammell, et al. Expires September 3, 2007 [Page 18] Internet-Draft IPFIX Files March 2007 semantics for IANA registered Information Elements (private enterprise number (PEN) 0) or for PENs with a special meaning within the IPFIX Protocol (e.g., the Reverse PEN from Bidirectional Flow Export using IPFIX [I-D.ietf-ipfix-biflow]). The template SHOULD contain the following Information Elements as defined in the IPFIX Information Model [I-D.ietf-ipfix-info] and above: +--------------------------------+----------------------------------+ | IE | Description | +--------------------------------+----------------------------------+ | templateId | The Template ID of the template | | | this record describes; it is | | | assumed to be valid within the | | | Observation Domain ID of the | | | containing IPFIX Message, and | | | MUST identify a Template that | | | has already been exported. This | | | Information Element MUST be | | | defined as a Scope Field. | | informationElementId | The Information Element | | | identifier of the Information | | | Element within the specified | | | Template this record describes. | | | This Information Element MUST be | | | defined as a Scope Field. | | privateEnterpriseNumber | The Private Enterprise number of | | | the Information Element within | | | the specified Template this | | | record describes. This | | | Information Element MUST be | | | defined as a Scope Field. | | informationElementStorageType | The storage type of the | | | specified Information Element. | | informationElementSemanticType | The semantic type of the | | | specified Information Element. | +--------------------------------+----------------------------------+ 6.2.2. Message Checksum Options Template The Message Checksum Options Template specifies the structure of a Data Record for attaching an MD5 message checksum to an IPFIX Message. An MD5 message checksum as described MAY be used if long- term data integrity is important to the application. The described Data Record MUST appear only once per IPFIX Message. The template SHOULD contain the following Information Elements as Trammell, et al. Expires September 3, 2007 [Page 19] Internet-Draft IPFIX Files March 2007 defined above: +--------------------+----------------------------------------------+ | IE | Description | +--------------------+----------------------------------------------+ | messageScope | A marker denoting this Option applies to the | | | whole IPFIX message; content is ignored. | | | This Information Element MUST be defined as | | | a Scope Field. | | messageMD5Checksum | The MD5 checksum of the containing IPFIX | | | Message. | +--------------------+----------------------------------------------+ 6.2.3. Template Anonymization Options Template The Template Anonymization Options Template specifies the structure of a Data Record for attaching anonymization notation information to Information Elements in specified Template Records. A Data Record described by this Template SHOULD appear for each Information Element within a Template known by the Exporting Process or File Writer to contain anonymized data. The template SHOULD contain the following Information Elements as defined in the IPFIX Information Model [I-D.ietf-ipfix-info] and above: +-------------------------------------+-----------------------------+ | IE | Description | +-------------------------------------+-----------------------------+ | templateId | The Template ID of the | | | template this record | | | describes; it is assumed to | | | be valid within the | | | Observation Domain ID of | | | the containing IPFIX | | | Message, and MUST identify | | | a Template that has already | | | been exported. This | | | Information Element MUST be | | | defined as a Scope Field. | | informationElementId | The Information Element | | | identifier of the | | | Information Element within | | | the specified Template this | | | record describes. This | | | Information Element MUST be | | | defined as a Scope Field. | Trammell, et al. Expires September 3, 2007 [Page 20] Internet-Draft IPFIX Files March 2007 | privateEnterpriseNumber | The Private Enterprise | | | number of the Information | | | Element within the | | | specified Template this | | | record describes. May be 0 | | | if this record describes a | | | public Information Element. | | | This Information Element | | | MUST be defined as a Scope | | | Field. | | informationElementAnonymizationType | The anonymization type of | | | the specified Information | | | Element. | +-------------------------------------+-----------------------------+ 6.3. Recommended Compression Strategy for File Writers Note that, since any file may be compressed and decompressed with a variety of widely available tools implementing a variety of compression standards (both specified and de facto), compression of IPFIX File data can be accomplished externally. However, compression at the file level may not be particularly resilient to errors; in the worst case, a single bit error in a stream-compressed file may result in the loss of the entire file. To limit the impact of errors on the recoverability of compressed data, we recommend the use of block compression where possible. However, block-compressed IPFIX Files also have some recovery problems, because it is difficult to resynchronize a partially damaged IPFIX Message stream due to the fact that the IPFIX version 1 beginning-of-message marker (the Version field of the Message Header, 0x00 0x0A) may commonly appear in the body of an IPFIX Message. Therefore, in applications (e.g. archival storage) in which error resilience is very important, we recommend that File Writers align compression block boundaries with IPFIX Message boundaries, so that each new compression block starts with a new IPFIX Message. This can be achieved either by manually adjusting the block boundaries (for compression facilities which support this), or by padding out the IPFIX Message Stream with a Data Set described by a Template containing a single 65534-byte paddingOctets Information Element, sized to meet the end of the compression block boundary. Since no Data Set can realistically contain an Information Element of this size, the entire Data Set will be treated as set padding (as in Section 3.3.1 of the IPFIX Protocol [I-D.ietf-ipfix-protocol].) Note that this strategy requires a minimum padding of 4 bytes for the data set header. Trammell, et al. Expires September 3, 2007 [Page 21] Internet-Draft IPFIX Files March 2007 [EDITOR'S NOTE: the description of set padding for compression block boundaries above is a little too brief, and requires elaboration.] 7. Examples Examples are not yet available as the file format has not yet been fully described. A future revision of this document will contain examples. 8. Security Considerations The IPFIX-based file format itself does not directly introduce security issues. Rather it is used to store information which may for privacy or business issues be considered sensitive. The file format must therefore provide appropriate procedures to guarantee the integrity and confidentiality of the stored information. The underlying protocol used to exchange the information that will be stored using the format proposed in this document must as well apply appropriate procedures to guarantee the integrity and confidentiality of the exported information. Such issues are addressed in separate documents, specifically in the IPFIX Protocol [I-D.ietf-ipfix-protocol]. 9. IANA Considerations This document requests the addition of the information elements in section 6.1 to the IANA IPFIX Information Element Registry. 10. Open Issues and Notes The survey of existing file formats is incomplete, and includes only file formats with which one of the authors has personal experience. We need to be a bit more concrete with respect to which requirements we cannot meet with XML. We need to address indexing and searching. Indexing should cover first timestamp in file, last timestamp in file, and a way to note the order of key fields on which the file is sorted. Review and rework compression block boundary set padding to make it clearer. Add guidelines on stream resynchronization to point out that it is possible to use block compression without padding for Trammell, et al. Expires September 3, 2007 [Page 22] Internet-Draft IPFIX Files March 2007 alignment. The present defintion tying a File to a Transport Session is inadequate (it essentially says "A File is a single Transport Session, except when it isn't.") We need to more explicitly define how File Writers and File Readers should decide when a Transport Session ends at end-of-file and when it may span multiple files. We need a new section, Applicability to IPFIX Collection Systems, which notes at least 1. techniques for using the IPFIX File format in debugging IPFIX Exporting and Collecting Processes, and in storing IPFIX Messages for testing and debugging purposes; 2. the fact that Collecting Processes or Exporting Processes using IPFIX Files for on- disk storage can realize significant code reuse benefits; and 3. suggested Options Templates for the storage of collection infrastructure details, containing at least Exporting Process identifying information. We need a new section, Applicability to NetFlow V9 Collection Systems, which notes that transformation from the V9 to the IPFIX message format is simple, and that using IPFIX Files instead of V9 PDU files has the advantage of interoperability with future IPFIX- based collection systems, or being a common representation for multi- protocol collection. 11. Acknowledgements Thanks to Arno Wagner, Maurizio Molina, Tom Kosnar, and Andreas Kind for technical assistance with the requirements. Thanks to Andrew Johnson for pointing out a set-padding method for compression block boundary alignment. 12. References 12.1. Normative References [I-D.ietf-ipfix-protocol] Claise, B., "Specification of the IPFIX Protocol for the Exchange", draft-ietf-ipfix-protocol-24 (work in progress), November 2006. [I-D.ietf-ipfix-info] Quittek, J., "Information Model for IP Flow Information Export", draft-ietf-ipfix-info-15 (work in progress), February 2007. Trammell, et al. Expires September 3, 2007 [Page 23] Internet-Draft IPFIX Files March 2007 [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. 12.2. Informative References [I-D.ietf-ipfix-biflow] Trammell, B. and E. Boschi, "Bidirectional Flow Export using IPFIX", draft-ietf-ipfix-biflow-02 (work in progress), January 2007. [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Authors' Addresses Brian H. Trammell CERT Network Situational Awareness Software Engineering Institute 4500 Fifth Avenue Pittsburgh, Pennsylvania 15213 United States Phone: +1 412 268 9748 Email: bht@cert.org Elisa Boschi Hitachi Europe SAS Immeuble Le Theleme 1503 Route les Dolines 06560 Valbonne France Phone: +33 4 89874100 Email: elisa.boschi@hitachi-eu.com Trammell, et al. Expires September 3, 2007 [Page 24] Internet-Draft IPFIX Files March 2007 Lutz Mark Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee 31 10589 Berlin Germany Phone: +49 30 3463 7306 Email: mark@fokus.fraunhofer.de Tanja Zseby Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee 31 10589 Berlin Germany Phone: +49 30 3463 7153 Email: zseby@fokus.fraunhofer.de Trammell, et al. Expires September 3, 2007 [Page 25] Internet-Draft IPFIX Files March 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Trammell, et al. Expires September 3, 2007 [Page 26]