IPFIX Working Group                                          B. Trammell
Internet-Draft                                                CERT/NetSA
Intended status: Informational                                 E. Boschi
Expires: April 23, 2007                                   Hitachi Europe
                                                                 L. Mark
                                                                T. Zseby
                                                        Fraunhofer FOKUS
                                                        October 20, 2006


                       An IPFIX-Based File Format
                    draft-trammell-ipfix-file-02.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 23, 2007.

Copyright Notice

   Copyright (C) The Internet Society (2006).

Abstract

   This document describes a file format for the storage of flow data
   based upon the IPFIX message format.  It proposes a set of
   requirements for flat-file, binary flow data file formats, evaluates
   flow storage systems presently in use for their conformance to these



Trammell, et al.         Expires April 23, 2007                 [Page 1]

Internet-Draft                 IPFIX Files                  October 2006


   requirements, then applies the IPFIX message format to these
   requirements to build a new file format.  This IPFIX-based file
   format is designed to facilitate interoperability and reusability
   among a wide variety of flow storage, processing, and analysis tools.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .  5
   4.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Record Format Flexibility  . . . . . . . . . . . . . . . .  6
     4.2.  Self Description . . . . . . . . . . . . . . . . . . . . .  7
     4.3.  Data Compression . . . . . . . . . . . . . . . . . . . . .  7
     4.4.  Indexing and Searching . . . . . . . . . . . . . . . . . .  8
     4.5.  Data Integrity . . . . . . . . . . . . . . . . . . . . . .  8
     4.6.  Creator Authentication and Confidentiality . . . . . . . .  9
     4.7.  Anonymization and Obfuscation  . . . . . . . . . . . . . .  9
     4.8.  Performance Characteristics  . . . . . . . . . . . . . . . 10
   5.  Survey of Existing Flow and Trace File Formats . . . . . . . . 10
     5.1.  NetFlow V5/V7  . . . . . . . . . . . . . . . . . . . . . . 10
     5.2.  Argus 2  . . . . . . . . . . . . . . . . . . . . . . . . . 10
     5.3.  SiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     5.4.  libpcap dumpfile . . . . . . . . . . . . . . . . . . . . . 11
   6.  IPFIX File Format Description  . . . . . . . . . . . . . . . . 12
     6.1.  Recommended Information Elements for IPFIX Files . . . . . 14
       6.1.1.  informationElementId . . . . . . . . . . . . . . . . . 14
       6.1.2.  informationElementAnonymizationType  . . . . . . . . . 14
       6.1.3.  informationElementSemanticType . . . . . . . . . . . . 15
       6.1.4.  informationElementStorageType  . . . . . . . . . . . . 15
       6.1.5.  messageMD5Checksum . . . . . . . . . . . . . . . . . . 16
       6.1.6.  messageScope . . . . . . . . . . . . . . . . . . . . . 16
       6.1.7.  privateEnterpriseNumber  . . . . . . . . . . . . . . . 17
     6.2.  Recommended Options Templates for IPFIX Files  . . . . . . 17
       6.2.1.  Information Element Semantics Options Template . . . . 17
       6.2.2.  Message Checksum Options Template  . . . . . . . . . . 18
       6.2.3.  Template Anonymization Options Template  . . . . . . . 19
     6.3.  Recommended Compression Strategy for File Writers  . . . . 20
   7.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
   9.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
   10. Open Issues and Notes  . . . . . . . . . . . . . . . . . . . . 21
   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21
   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
     12.1. Normative References . . . . . . . . . . . . . . . . . . . 22
     12.2. Informative References . . . . . . . . . . . . . . . . . . 22
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22



Trammell, et al.         Expires April 23, 2007                 [Page 2]

Internet-Draft                 IPFIX Files                  October 2006


   Intellectual Property and Copyright Statements . . . . . . . . . . 24


















































Trammell, et al.         Expires April 23, 2007                 [Page 3]

Internet-Draft                 IPFIX Files                  October 2006


1.  Introduction

   This document proposes a file format based upon IPFIX.  It begins by
   exploring the motivation for proposing a standardized flow file
   format, and using IPFIX as the basis for this new file format.  It
   then proposes a set of requirements for this file format, evaluates
   existing flow storage file formats for their conformance to these
   requirements, and describes either how the IPFIX message format meets
   each requirement, how a file format based upon it could meet the
   requirement, or how the message format must be extended to meet the
   requirement.  It closes by proposing an initial specification of the
   new file format and providing examples of IPFIX Files meeting this
   specification.

   The purpose of this revision of the document is to foster discussion
   on the requirements and the initial proposed design this new file
   format.  It aims to do so without requiring any protocol or message
   format extensions, as such are currently out of scope for the IPFIX
   working group.  Requirements proposed in this document which cannot
   be met without such extensions are out of scope for this revision,
   and may be addressed in other Internet-Drafts.


2.  Terminology

   Terms used in this document that are defined in the Terminology
   section of the IPFIX Protocol [I-D.ietf-ipfix-protocol] document are
   to be interpreted as defined there.

   IPFIX File:   An IPFIX File is a serialized stream of IPFIX Messages
      stored on a filesystem.  Any IPFIX Message stream that would be
      considered valid when transported one or more of the specified
      IPFIX transports (SCTP, TCP, or UDP) as defined in the IPFIX
      Protocol draft [I-D.ietf-ipfix-protocol] is considered an IPFIX
      File for purposes of this draft; however, this draft further
      restricts that definition with recommendations on the construction
      of IPFIX Files that meet the requirements identified herein.

   IPFIX File Reader:   An IPFIX File Reader is a Process which reads
      IPFIX Files from a filesystem, and is analogous to an IPFIX
      Collecting Process.  An IPFIX File Reader MUST behave as an IPFIX
      Collecting Process as outlined in the IPFIX Protocol draft
      [I-D.ietf-ipfix-protocol], except as modified by this document.

   IPFIX File Writer:   An IPFIX File Writer is a process which writes
      IPFIX Files to a filesystem, and is analogous to an IPFIX
      Exporting Process.  An IPFIX File Writer MUST behave as an IPFIX
      Exporting Process as outlined in the IPFIX Protocol draft



Trammell, et al.         Expires April 23, 2007                 [Page 4]

Internet-Draft                 IPFIX Files                  October 2006


      [I-D.ietf-ipfix-protocol], except as modified by this document.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


3.  Motivation

   There are a wide variety of applications for the file-based storage
   of IP flow data, across a continuum of time scales.  Tools used in
   the analysis of flow data and creation of analysis products often use
   files as a convenient unit of work, with an ephemeral lifetime.  A
   set of flows relevant to a security investigation may be stored in a
   file for the duration of that investigation, and futher exchanged
   among incident handlers via email or within an external incident
   handling workflow application.  Sets of flow data relevant to
   Internet measurement research may be published as files, much as
   libpcap packet trace files are, to provide common data sets for the
   repeatability of research efforts; these files would have lifetimes
   measured in months or years.  Operational flow measurement systems
   also have a need for long-term, archival storage of flow data, either
   as a primary flow data repository, or as a backing tier for online
   storage in a relational database management system (RDBMS).

   The variety of applications of flow data, and the variety of
   presently deployed storage approaches, would seem to indicate the
   need for a standard approach to flow storage with applicability
   across the continuum of time scales over which flow data is stored.
   A storage format based around flat files would best address the
   variety of storage requirements.  While much work has been done on
   structured storage via RDBMS, relational database systems are not a
   good basis for format standardization owing to the fact that their
   internal data structures are generally private to a single
   implementation and subject to change for internal reasons.  Also,
   there are a wide variety of operations available on flat files, and
   external tools and standards can be leveraged to meet file-based flow
   storage requiremenets.  Further, flow data is often not very
   semantically complicated, is managed in very high volume, and
   therefore an RDBMS-based flow storage system would not benefit much
   from the advantages of relational database technology.

   The simplest way to create a new file format is simply to serialize
   some internal data model to disk, with either textual or binary
   representation of data elements, and some framing strategy for
   delimiting fields and records.  "Ad-hoc" file formats such as this
   have several important disadvantages.  One, they impose the semantics
   of the data model from which they are derived on the file format; as



Trammell, et al.         Expires April 23, 2007                 [Page 5]

Internet-Draft                 IPFIX Files                  October 2006


   such, they are difficult to extend, describe, and standardize.

   The emergence over the past decade of XML as a new "universal"
   framing format for flat as well as hierarchical data addresses these
   concerns; however, XML is not necessarily ideal for a storage format
   for flow data.  First, flow data, being inherently simple and record-
   oriented, does not benefit from the more advanced semantics available
   with XML.  There is not much to be gained by describing each record
   individually when the records all have the same format, or one of a
   small set of formats.  Second, XML processing introduces potentially
   significant overhead.  While an XML stream should in theory be
   approximately as compressible as any other stream representation, the
   additional compression/decompression and generation/parsing of XML
   data is not worth the benefit in this case.

   This leads us to propose the IPFIX message format as the basis for a
   new flow data file format.  The IPFIX working group, in defining the
   IPFIX protocol, has already defined an information model and data
   formatting rules for representation of flow data.  Especially at
   shorter time scales, when a file is a unit of data interrchange, the
   filesystem may be viewed as simply another IPFIX message transport
   between processes.  This format is especially well suited to
   representing flow data, as it was designed specifically for flow data
   export; it is easily extensible unlike ad-hoc serialization, and
   compact unlike XML.  In addition, IPFIX is an emerging standard for
   the export and collection of flow data; using a common format for
   storage and analysis at the collection side allows implementors to
   use substantially the same information model and data formatting
   implementation for transport as well as storage.


4.  Requirements

   In this section, we outline a proposed set of requirements for any
   persistent storage format for flow data.  First and foremost, a flow
   data file format should support storage across the continuum of time
   scales important to flow storage applications.  Each of the
   requirements enumerated in the sections below is broadly applicable
   to flow storage applications, though each may be more important at
   certain time scales.  For each, we first identify the requirement,
   then explain how the IPFIX message format addresses it, or briefly
   outline the changes that must be made in order for an IPFIX-based
   file format to meet the requirement.

4.1.  Record Format Flexibility

   Due to the wide variety of flow attributes collected by different
   network flow attribute measurement systems, the ideal flow storage



Trammell, et al.         Expires April 23, 2007                 [Page 6]

Internet-Draft                 IPFIX Files                  October 2006


   format will not impose a single data model or a specific record type
   on the flows it stores.  The file format must be flexible and
   extensible; that is, it must support multiple record types definable
   within the file itself, and must be able to support new field types
   for data within the records in a graceful way.

   IPFIX provides extensibility through the use of Templates to describe
   each Data Record, through the use of an IANA Registry to define its
   Information Elements, and through the use of enterprise-specific
   Information Elements.

4.2.  Self Description

   Archived data may be read at a time in the future where any external
   reference to the meaning of the data may be lost.  The ideal flow
   storage format should be self-describing; that is, a process reading
   flow data from storage should be able to properly interpret the
   stored flows without reference to anything other than standard
   sources (e.g., the standards document describing the file format) and
   the stored flow data itself.

   The IPFIX message format is partially self-describing; that is, IPFIX
   Templates containing only IANA-assigned Information Elements can be
   completely interpreted according to the IPFIX Information Model
   without additional external data.

   However, Templates containing private information elements lack
   detailed type and semantic information; a Collecting Process
   receiving data described by a template containing private Information
   Elements it does not understand can only treat the data contained
   within those Information Elements as octet arrays.  To be fully self-
   describing, enterprise-specific Information Elements must be
   additionally described via IPFIX Options according to the Information
   Element Semantics Options Template defined below.

4.3.  Data Compression

   Regardless of the representation format, flow data describing traffic
   on real networks tends to be highly compressible.  Compression tends
   to improve the scalability of flow collection systems, by reducing
   the disk storage and I/O bandwidth requirement for a given workload.
   The ideal flow storage format should support applications which wish
   to leverage this fact by supporting compression of stored data.

   The IPFIX message format has no support for data compression, as the
   IPFIX protocol was designed for speed and simplicity of export.  Of
   course, any flat file is readily compressible using a wide variety of
   external data compression tools, formats, and algorithms; therefore,



Trammell, et al.         Expires April 23, 2007                 [Page 7]

Internet-Draft                 IPFIX Files                  October 2006


   this requirement can be met externally.

   However, a couple of simple optimizations can be made by File Writers
   to increase the integrity and usability of compressed IPFIX data;
   these are outlined in the Recommended Compression Strategy, which
   appears below.

4.4.  Indexing and Searching

   Binary, record stream oriented file formats natively support only one
   form of searching, sequential scan in file order.  By choosing the
   order of records in a file carefully (e.g., by time), a file can be
   "indexed" by a single key.  Adding additional indexes to the file can
   speed searches considerably.  The ideal flow storage format will
   support a method for noting that the records in a file are sorted by
   a certain key or set of keys, and for providing index information for
   keys on which the file is not sorted.

   There is presently no support for indexing or sort order notation in
   the IPFIX message format.  If internal indexing is required, it would
   need to be added to an IPFIX-based file format by extension.  This
   revision of this draft does not address this requirement further,
   though it may be addressable without protocol or message format
   changes.

4.5.  Data Integrity

   When storing flow data over long time scales, especially for archival
   purposes, it is important to ensure that hardware or software faults
   do not introduce errors into the data over time.  The ideal flow
   storage format will support the detection and correction of encoding-
   level errors in the data.

   Note that more advanced error correction is almost certainly best
   handled at a layer below that addressed by this document.  Error
   correction is a topic well addressed by the storage industry in
   general (e.g. by RAID and other technolgies), and by specifying a
   flow storage format based upon files, we can leverage these features
   to meet this requirement.

   However, the ideal flow storage format will be resilient against
   errors, providing an internal facility for the detection of errors
   and the ability to isolate errors to as few data records as possible.

   Note that this requirement interacts with the choice of data
   compression algorithm.  The use of block compression algorithms can
   server to isolate errors to a single compression block, unlike stream
   compressors, which may fail to resynchronize after a single bit



Trammell, et al.         Expires April 23, 2007                 [Page 8]

Internet-Draft                 IPFIX Files                  October 2006


   error, invalidating the entire message stream.  See the Recommended
   Compression Strategy below for more on this interaction.

   The IPFIX message format does not support data integrity assurance.
   It is assumed that advanced error correction will be provided
   externally.  For simple error detection support, checksums may be
   attached to messages via IPFIX Options according to the Message
   Checksum Options Template defined below.

4.6.  Creator Authentication and Confidentiality

   Storage of flow data across long time scales may also require
   assurance that no unauthorized entity can read or modify the stored
   data.  Asymmetric-key cryptography can be applied to this problem, by
   signing flow data with the private key of the creator, and encrypting
   it with the public keys of those authorized to read it.  The ideal
   flow storage format will support the encryption and signing of flow
   data.

   As with error correction, this problem has been addressed well at a
   layer below that addressed by this document.  Instead of specifying a
   particular choice of encryption technology, we can leverage the fact
   that existing cryptographic technologies work quite well on data
   stored in files to meet this requirement.

   Beyond support for the use of TLS for transport over TCP or SCTP,
   both of which provide transient authentication and confidentiality,
   the IPFIX message format does not support this requirement directly.
   It is assumed that this requirement will be met externally.

4.7.  Anonymization and Obfuscation

   To ensure the privacy of individuals and organizations at the
   endpoints of communications represented by flow records, it is often
   necessary to obfuscate or anonymize stored and exported flow data.
   The ideal flow storage format will provide for a notation that a
   given information element on a given record type represents
   anonymized, rather than real, data.

   The IPFIX message format presently has no support for anonymization
   notation.  It should be noted that anonymization is one of the
   requirements given for IPFIX in RFC 3917 [RFC3917].  The decision to
   qualify this requirement with 'MAY' and not 'MUST' in the
   requirements document, and its subsequent lack of specification in
   the current version of the IPFIX protocol, is due to the fact that
   anonymization algorithms are still a research issue, and that there
   currently exist no standardized methods for anonymization.




Trammell, et al.         Expires April 23, 2007                 [Page 9]

Internet-Draft                 IPFIX Files                  October 2006


   Simple anonymization notation may be attached to templates via IPFIX
   Options according to the Template Anonymization Options Template
   defined below.

4.8.  Performance Characteristics

   The ideal standard flow storage format will not have a significant
   negative impact on the performance of the application implementing
   it.  This is a non-functional requirement, but it is important to
   note that a standard that implies a performance penalty is unlikely
   to be widely implemented.

   A static analysis of the IPFIX message format would seem to suggest
   that implementations of it are not particularly prone to slowness;
   indeed, a template-based data representation is more easily subject
   to optimization for common cases than representations that embed
   structural information directly in the data stream (e.g.  XML) are
   not.  However, a full analysis of the impact of using IPFIX messages
   as a basis for flow data storage on read/write performance will
   require more implementation experience and performance measurement.


5.  Survey of Existing Flow and Trace File Formats

5.1.  NetFlow V5/V7

   One de facto standard for the storage of flow data collected via
   Cisco NetFlow V5 or V7 is to serialize a stream of "raw" NetFlow
   datagrams into files.  These NetFlow PDU files consist of a
   collection of header- prefixed blocks (corresponding to the datagrams
   as received on the wire) containing fixed-length binary flow records.
   NetFlow V5 and V7 data may be mixed within a given file, as the
   header on each datagram defines the NetFlow version of the records
   following; there is indeed very little difference between the two
   record formats.

   NetFlow V5/V7 PDU files are neither extensible nor self-describing;
   however, their status as a de facto standard means the definition of
   the data format is well-understood.  Indexing, compression, error
   detection and correction, authentication, and confidentiality must be
   handled externally.

5.2.  Argus 2

   QoSient's Argus (as of version 2.0.6) uses a file format based upon a
   stream of type-and-length prefixed records.  There are two general
   types of records in this stream, management records and flow records.
   Management records export flow collection statistics, much like the



Trammell, et al.         Expires April 23, 2007                [Page 10]

Internet-Draft                 IPFIX Files                  October 2006


   recommended scoped data records in the IPFIX protocol.  Flow records
   contain information about a single flow each, and are further typed
   based upon the protocol of the flow (e.g., IP, ICMP, ARP).  The Argus
   file format natively spports bidirectional flow export, as each flow
   record contains both forward and reverse counters.

   The Argus tools support a transport protocol that simply encapsulates
   a record stream over a TCP connection.  Transport is collector-
   initiated; that is, a collector establishes a connection to an
   exporter in order to read a record stream.

   Argus files are not self-describing; that is, only the Argus tools
   themselves encapsulate the definition of each of the record types.
   The Argus file format is not extensible without changing the Argus
   implementation.  Argus provides no indexing facility for its file
   format, though records are roughly sorted by record generation time.
   Compression, error correction, authentication, and confidentiality
   are handled externally to the format, and are available as with all
   files.  There is no special support for data obfuscation in the
   format.

5.3.  SiLK

   The CERT/NetSA SiLK tools use a set of fixed-length binary record
   formats.  Each file is prefixed with a header which denotes which
   record format the file is stored in.  These record formats are
   differentiated by the presence or absence of certain fields; in this
   way, each format identifier is essentially a short-hand identifier
   for a template describing the record.  This also implies that only
   one type of record may be stored in any given file.

   As with Argus, SiLK files are not self-describing and are not
   extensible.  SiLK provides no indexing facility, though files are
   generally stored in flow end time order; and when used for archival
   storage, information about sensors and flow times appearing in each
   file is stored in the file path name.  Compression is handled
   internally to the file format, and allows the storage of compressed
   data in a file with uncompressed headers, and a guarantee of
   compression block boundary alignment with record boundaries.  Error
   correction, authentication, and confidentiality can be handled
   externally.  There is no special support for data obfuscation in the
   SiLK file format.

5.4.  libpcap dumpfile

   The libpcap dumpfile format is a packet trace format rather than a
   flow file format, so it does not address any of the requirements
   outlined above.  However, it is used widely in a use case (data



Trammell, et al.         Expires April 23, 2007                [Page 11]

Internet-Draft                 IPFIX Files                  October 2006


   storage and distribution for network measurement research) similar to
   one addressed by the format proposed in this draft, so we include it
   here.

   libpcap dumpfiles consist of a file header containing information
   common to the whole file (most importantly, the datalink layer, for
   interpretation of the datalink headers on each frame), followed by a
   set of raw captured frame records each prefixed by a frame header
   containing timestamp and length information.  The format is not
   particularly flexible or self-describing, nor does it need to be:
   undecoded frames are about as semantically simple as network traffic
   data can get.

   However, the simplicity and ubiquity of the libpcap dumpfile format
   has led to its becoming a de facto standard for the distribution of
   packet trace data for Internet measurement applications.  We propose
   the file format described in this draft in part as an analogue to the
   libpcap dumpfile format for flow data.

   Note that libpcap dumpfiles could be used as a storage format for any
   unidirectional, datagram-oriented protocol such as IPFIX or NetFlow,
   simply by storing the captured export session.  However, this has
   several important drawbacks.  First, the additional per-packet
   headers provided by pcap are redundant in the case of IPFIX, as
   length and export time are already available in the IPFIX Message
   Header.  Second, the link, network, and transport layer headers are
   stored in a dumpfile; these are not necessary for the successful
   interpretation of an IPFIX Message, and add additional decode
   overhead.  Third, a file created by capturing an export session may
   require additional processing to reassemble fragmented datagrams in
   the message stream.


6.  IPFIX File Format Description

   An IPFIX file, as defined by this draft and elaborated below, is at
   its core simply an IPFIX Message stream serialized to some
   filesystem.  Any valid serialized IPFIX Message stream MUST be
   accepted by a File Reader as a valid IPFIX file.  In this way, the
   filesystem is simply treated as another IPFIX Transport alongside
   SCTP, TCP, and UDP, although one with unusually high latency, as the
   File Reader and File Writer are not necessarily synchronized in time,
   unlike IPFIX Collecting and Exporting Processes.

   An IPFIX File Reader MUST accept as valid any IPFIX message stream
   that would be considered valid by one or more of the other defined
   IPFIX transport layers.  Practically, this means that the union of
   template management features supported by SCTP, TCP, and UDP MUST be



Trammell, et al.         Expires April 23, 2007                [Page 12]

Internet-Draft                 IPFIX Files                  October 2006


   supported in IPFIX Files:

   o  Template Sets and Options Template Sets MAY appear in the same
      IPFIX Message as Data Sets, as with TCP and UDP.

   o  Template Sets that define already-defined templates may appear
      multiple times in an IPFIX Message Stream, as they would with UDP
      template retransmission (as described in section 10.3.6 of the
      IPFIX Protocol draft [I-D.ietf-ipfix-protocol]).  In the event of
      a conflict between a resent definition and a previous definition,
      the new template replaces the old, as consistent with UDP template
      expiration and ID reuse.

   o  Template Withdrawals (as described in section 8 of the IPFIX
      Protocol draft [I-D.ietf-ipfix-protocol]) may appear and are valid
      as long as the Template to be withdrawn is defined, as in TCP and
      SCTP.  However, as Template IDs may be directly reused as
      described above, Template Withdrawals are completely optional in
      IPFIX Files.

   However, for representation simplicity and read performance, File
   Writers SHOULD use the following template and scope management
   strategy:

   o  Template Sets and Options Template Sets SHOULD appear in the file
      before any Data Sets, to ensure all Templates are available before
      any data is read.

   o  Data Records described by Options Templates SHOULD appear in the
      file before any Data Records which depend on the scopes defined by
      those options.

   Practically speaking, this means an IPFIX File SHOULD consist of
   Template Sets, followed by Options, followed by Data Sets.

   A Transport Session SHOULD be synonymous with a single File.  In
   other words, the beginning of a file SHOULD be as interpreted by a
   File Reader as the beginning of a Transport Session, and the end of a
   file SHOULD be interpreted by a File Reader as the end of a Transport
   Session.  This implies that Templates and Options are limited in
   scope to the single File in which thet are defined.

   However, depending on the application, File Readers and File Writers
   MAY be flexibile with respect to their definition of a Transport
   Session.  A File Reader MAY be configurable to treat a collection of
   Files (e.g., all the files in a directory) as a single Transport
   Session, especially when used for archival purposes.




Trammell, et al.         Expires April 23, 2007                [Page 13]

Internet-Draft                 IPFIX Files                  October 2006


6.1.  Recommended Information Elements for IPFIX Files

   The following information elements are used by the options templates
   below to allow IPFIX message streams to meet the requirements
   outlined above without extension to the message format or protocol.
   IPFIX File Readers and Writers SHOULD support these information
   elements as defined below.

6.1.1.  informationElementId

   Description:   An information element ID, as would appear in an IPFIX
      Template Record.  This element can be used to scope properties to
      a specific information element within a Template.  This IE should
      be encoded with the Enterprise ID bit set to 0, regardless of
      whether the Enterprise ID bit is set in the template to which this
      IE refers.  See the definition of privateEnterpriseNumber below
      for more on the use of this IE to describe vendor-specific IEs.

   Abstract Data Type:   unsigned16

   Data Type Semantics:   identifier

   ElementId:   TBD

   Status:   Proposed

   Reference:   Section 3.4.1 of the IPFIX Protocol draft

6.1.2.  informationElementAnonymizationType

   Description:   A description of the anonymization status of an IPFIX
      information element within a template.  If this field is FALSE,
      the corresponding IE is not anonymized; to the best ability of the
      Exporting Process to determine, it represents a real value.  If
      this field is TRUE, the corresponding IE is anonymized; to the
      best ability of the Exporting Process to determine, it represents
      a value that has been transformed to maintain privacy.  Note that
      if no informationElementAnonymizationType is specified for an
      information element, it is assumed to be FALSE, or not anonymized.

   Abstract Data Type:   boolean

   ElementId:   TBD

   Status:   Proposed






Trammell, et al.         Expires April 23, 2007                [Page 14]

Internet-Draft                 IPFIX Files                  October 2006


6.1.3.  informationElementSemanticType

   Description:   A description of the semantics of an IPFIX information
      element within a template.  The possible values of this field are
      not yet defined; this is an open issue.

   Abstract Data Type:   octet

   ElementId:   TBD

   Status:   Proposed

6.1.4.  informationElementStorageType

   Description:   A description of the storage type of an IPFIX
      information element within a template.  These correspond to the
      abstract data types defined in section 3.1 of the IPFIX
      Information Model [I-D.ietf-ipfix-info]; see that section for more
      information on the types described below.  This field may take the
      following values:

                     +-------+----------------------+
                     | Value | Description          |
                     +-------+----------------------+
                     | 0x00  | octetArray           |
                     | 0x01  | unsigned8            |
                     | 0x02  | unsigned16           |
                     | 0x03  | unsigned32           |
                     | 0x04  | unsigned64           |
                     | 0x05  | signed8              |
                     | 0x06  | signed16             |
                     | 0x07  | signed32             |
                     | 0x08  | signed64             |
                     | 0x09  | float32              |
                     | 0x0A  | float64              |
                     | 0x0B  | boolean              |
                     | 0x0C  | macAddress           |
                     | 0x0D  | string               |
                     | 0x0E  | dateTimeSeconds      |
                     | 0x0F  | dateTimeMilliseconds |
                     | 0x10  | dateTimeMicroseconds |
                     | 0x11  | dateTimeNanoseconds  |
                     | 0x12  | ipv4Address          |
                     | 0x13  | ipv6Address          |
                     +-------+----------------------+






Trammell, et al.         Expires April 23, 2007                [Page 15]

Internet-Draft                 IPFIX Files                  October 2006


   Abstract Data Type:   octet

   ElementId:   TBD

   Status:   Proposed

   Reference:   Section 3.1 of the IPFIX Information Model

6.1.5.  messageMD5Checksum

   Description:   The MD5 checksum of the IPFIX Message containing this
      record.  This IE SHOULD be bound to its containing IPFIX Message
      via an options record and the messageScope IE, as defined below,
      and SHOULD appear only once in a given IPFIX Message.  To
      calculate the value of this IE, first buffer the containing IPFIX
      Message, setting the value of this IE to all zeroes.  Then
      caluclate the MD5 checksum of the resulting buffer as defined in
      RFC 1321 [RFC1321], place the resulting value in this IE, and
      export the buffered message.

   Abstract Data Type:   octetArray (16 bytes)

   ElementId:   TBD

   Status:   Proposed

   Reference:   RFC 1321, The MD5 Message-Digest Algorithm [RFC1321]

6.1.6.  messageScope

   Description:   The presence of this Information Element as scope in
      an Options Template signifies that the options described by the
      Template apply to the IPFIX Message that contains them.  It is
      defined for general purpose message scoping of options, and
      proposed specifically to allow the attachment a checksum to a
      message via IPFIX Options.  The value of this Information Element
      SHOULD be ignored by the File Reader or the Collecting Process.

   Abstract Data Type:   octet

   ElementId:   TBD

   Status:   Proposed








Trammell, et al.         Expires April 23, 2007                [Page 16]

Internet-Draft                 IPFIX Files                  October 2006


6.1.7.  privateEnterpriseNumber

   Description:   A private enterprise number used to scope an
      information element ID, as would appear in an IPFIX Template
      Record.  This element can be used to scope properties to a
      specific information element within a Template.  If the Enterprise
      ID bit of the corresponding Information Element is cleared (has
      the value 0), this IE should be set to 0.  The presence of a non-
      zero value in this IE implies that the Enterprise ID bit of the
      corresponding Information Element is set (has the value 1).

   Abstract Data Type:   unsigned32

   Data Type Semantics:   identifier

   ElementId:   TBD

   Status:   Proposed

   Reference:   Section 3.4.1 of the IPFIX Protocol draft

6.2.  Recommended Options Templates for IPFIX Files

   The following options templates allow IPFIX message streams to meet
   the requirements outlined above without extension to the message
   format or protocol.  They are defined in terms of existing
   Information Elements defined in the IPFIX Information Model
   [I-D.ietf-ipfix-info], as well as new Information Elements defined in
   the section above.  IPFIX File Readers and Writers SHOULD support
   these options templates as defined below.

6.2.1.  Information Element Semantics Options Template

   The Information Element Semantics Options Template specifies the
   structure of a Data Record for attaching semantic and storage type
   information to enterprise-specific Information Elements in specified
   Template Records.  Data Records described by this Template SHOULD
   appear for each enterprise-specific Information Element used within a
   File.  Collecting Processes and IPFIX File Readers can use options
   data described by this template to improve handling of unknown
   information elements.  Note that the template MAY be used to describe
   public Information Elements, such as Information Elements that may
   have been added to the IANA registry after the last update of a given
   Collecting Process or File Reader; however, Collecting Processes or
   File Readers MUST NOT allow semantic or storage type information
   contained within these records to override their own specified
   handling of public Information Elements.




Trammell, et al.         Expires April 23, 2007                [Page 17]

Internet-Draft                 IPFIX Files                  October 2006


   The template SHOULD contain the following Information Elements as
   defined in the IPFIX Information Model [I-D.ietf-ipfix-info] and
   above:

   +--------------------------------+----------------------------------+
   | IE                             | Description                      |
   +--------------------------------+----------------------------------+
   | templateId                     | The Template ID of the template  |
   |                                | this record describes; it is     |
   |                                | assumed to be valid within the   |
   |                                | Observation Domain ID of the     |
   |                                | containing IPFIX Message, and    |
   |                                | MUST identify a Template that    |
   |                                | has already been exported.  This |
   |                                | Information Element MUST be      |
   |                                | defined as a Scope Field.        |
   | informationElementId           | The Information Element          |
   |                                | identifier of the Information    |
   |                                | Element within the specified     |
   |                                | Template this record describes.  |
   |                                | This Information Element MUST be |
   |                                | defined as a Scope Field.        |
   | privateEnterpriseNumber        | The Private Enterprise number of |
   |                                | the Information Element within   |
   |                                | the specified Template this      |
   |                                | record describes.  May be 0 if   |
   |                                | this record describes a public   |
   |                                | Information Element.  This       |
   |                                | Information Element MUST be      |
   |                                | defined as a Scope Field.        |
   | informationElementStorageType  | The storage type of the          |
   |                                | specified Information Element.   |
   | informationElementSemanticType | The semantic type of the         |
   |                                | specified Information Element.   |
   +--------------------------------+----------------------------------+

6.2.2.  Message Checksum Options Template

   The Message Checksum Options Template specifies the structure of a
   Data Record for attaching an MD5 message checksum to an IPFIX
   Message.  An MD5 message checksum as described MAY be used if long-
   term data integrity is important to the application.  The described
   Data Record MUST appear only once per IPFIX Message.

   The template SHOULD contain the following Information Elements as
   defined above:





Trammell, et al.         Expires April 23, 2007                [Page 18]

Internet-Draft                 IPFIX Files                  October 2006


   +--------------------+----------------------------------------------+
   | IE                 | Description                                  |
   +--------------------+----------------------------------------------+
   | messageScope       | A marker denoting this Option applies to the |
   |                    | whole IPFIX message; content is ignored.     |
   |                    | This Information Element MUST be defined as  |
   |                    | a Scope Field.                               |
   | messageMD5Checksum | The MD5 checksum of the containing IPFIX     |
   |                    | Message.                                     |
   +--------------------+----------------------------------------------+

6.2.3.  Template Anonymization Options Template

   The Template Anonymization Options Template specifies the structure
   of a Data Record for attaching anonymization notation information to
   Information Elements in specified Template Records.  A Data Record
   described by this Template SHOULD appear for each Information Element
   within a Template known by the Exporting Process or File Writer to
   contain anonymized data.

   The template SHOULD contain the following Information Elements as
   defined in the IPFIX Information Model [I-D.ietf-ipfix-info] and
   above:

   +-------------------------------------+-----------------------------+
   | IE                                  | Description                 |
   +-------------------------------------+-----------------------------+
   | templateId                          | The Template ID of the      |
   |                                     | template this record        |
   |                                     | describes; it is assumed to |
   |                                     | be valid within the         |
   |                                     | Observation Domain ID of    |
   |                                     | the containing IPFIX        |
   |                                     | Message, and MUST identify  |
   |                                     | a Template that has already |
   |                                     | been exported.  This        |
   |                                     | Information Element MUST be |
   |                                     | defined as a Scope Field.   |
   | informationElementId                | The Information Element     |
   |                                     | identifier of the           |
   |                                     | Information Element within  |
   |                                     | the specified Template this |
   |                                     | record describes.  This     |
   |                                     | Information Element MUST be |
   |                                     | defined as a Scope Field.   |






Trammell, et al.         Expires April 23, 2007                [Page 19]

Internet-Draft                 IPFIX Files                  October 2006


   | privateEnterpriseNumber             | The Private Enterprise      |
   |                                     | number of the Information   |
   |                                     | Element within the          |
   |                                     | specified Template this     |
   |                                     | record describes.  May be 0 |
   |                                     | if this record describes a  |
   |                                     | public Information Element. |
   |                                     | This Information Element    |
   |                                     | MUST be defined as a Scope  |
   |                                     | Field.                      |
   | informationElementAnonymizationType | The anonymization type of   |
   |                                     | the specified Information   |
   |                                     | Element.                    |
   +-------------------------------------+-----------------------------+

6.3.  Recommended Compression Strategy for File Writers

   Note that, since any file may be compressed and decompressed with a
   variety of widely available tools implementing a variety of
   compression standards (both specified and de facto), compression of
   IPFIX File data can be accomplished externally.  However, compression
   at the file level may not be particularly resilient to errors; in the
   worst case, a single bit error in a stream-compressed file may result
   in the loss of the entire file.

   To limit the impact of errors on the recoverability of compressed
   data, we recommend the use of block compression where possible.
   However, block-compressed IPFIX Files also have some recovery
   problems, because it is difficult to resynchronize a partially
   damaged IPFIX Message stream due to the fact that the IPFIX version 1
   beginning-of-message marker (the Version field of the Message Header,
   0x00 0x0A) may commonly appear in the body of an IPFIX Message.

   Therefore, in applications (e.g. archival storage) in which error
   resilience is very important, we recommend that File Writers align
   compression block boundaries with IPFIX Message boundaries, so that
   each new compression block starts with a new IPFIX Message.  This can
   be achieved either by manually adjusting the block boundaries (for
   compression facilities which support this), or by padding out the
   IPFIX Message Stream with a Data Set described by a Template
   containing a single one-byte paddingOctets Information Element to
   reach a known compression block boundary.  Note that this latter
   strategy requires a minimum padding of 5 bytes (4 byte set header
   followed by at least one byte of padding).







Trammell, et al.         Expires April 23, 2007                [Page 20]

Internet-Draft                 IPFIX Files                  October 2006


7.  Examples

   Examples are not yet available as the file format has not yet been
   fully described.  A future revision of this document will contain
   examples.


8.  Security Considerations

   The IPFIX-based file format itself does not directly introduce
   security issues.  Rather it is used to store information which may
   for privacy or business issues be considered sensitive.  The file
   format must therefore provide appropriate procedures to guarantee the
   integrity and confidentiality of the stored information.

   The underlying protocol used to exchange the information that will be
   stored using the format proposed in this document must as well apply
   appropriate procedures to guarantee the integrity and confidentiality
   of the exported information.  Such issues are addressed in separate
   documents, specifically in the IPFIX Protocol
   [I-D.ietf-ipfix-protocol].


9.  IANA Considerations

   This document requests the addition of the information elements in
   section 6.1 to the IANA IPFIX Information Element Registry.


10.  Open Issues and Notes

   The survey of existing file formats is incomplete, and includes only
   file formats with which one of the authors has personal experience.
   [bht]

   The set of semantic data types is unspecified. [bht]

   Need to be a bit more concrete with respect to what requirements we
   cannot meet with XML. [bht]

   Need to address indexing and searching, or at least be more explicit
   about why we're not doing so. [bht]


11.  Acknowledgements

   Thanks to Arno Wagner for technical assistance with the requirements.




Trammell, et al.         Expires April 23, 2007                [Page 21]

Internet-Draft                 IPFIX Files                  October 2006


12.  References

12.1.  Normative References

   [I-D.ietf-ipfix-protocol]
              Claise, B., "Specification of the IPFIX Protocol for the
              Exchange of IP Traffic Flow  Information",
              draft-ietf-ipfix-protocol-23 (work in progress),
              October 2006.

   [I-D.ietf-ipfix-info]
              Quittek, J., "Information Model for IP Flow Information
              Export", draft-ietf-ipfix-info-13 (work in progress),
              September 2006.

   [RFC1321]  Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
              April 1992.

12.2.  Informative References

   [RFC3917]  Quittek, J., Zseby, T., Claise, B., and S. Zander,
              "Requirements for IP Flow Information Export (IPFIX)",
              RFC 3917, October 2004.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.


Authors' Addresses

   Brian H. Trammell
   CERT Network Situational Awareness
   Software Engineering Institute
   4500 Fifth Avenue
   Pittsburgh, Pennsylvania  15213
   United States

   Phone: +1 412 268 9748
   Email: bht@cert.org












Trammell, et al.         Expires April 23, 2007                [Page 22]

Internet-Draft                 IPFIX Files                  October 2006


   Elisa Boschi
   Hitachi Europe SAS
   Immeuble Le Theleme
   1503 Route les Dolines
   06560 Valbonne
   France

   Phone: +33 4 89874100
   Email: elisa.boschi@hitachi-eu.com


   Lutz Mark
   Fraunhofer Institute for Open Communication Systems
   Kaiserin-Augusta-Allee 31
   10589 Berlin
   Germany

   Phone: +49 30 3463 7306
   Email: mark@fokus.fraunhofer.de


   Tanja Zseby
   Fraunhofer Institute for Open Communication Systems
   Kaiserin-Augusta-Allee 31
   10589 Berlin
   Germany

   Phone: +49 30 3463 7153
   Email: zseby@fokus.fraunhofer.de






















Trammell, et al.         Expires April 23, 2007                [Page 23]

Internet-Draft                 IPFIX Files                  October 2006


Full Copyright Statement

   Copyright (C) The Internet Society (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Trammell, et al.         Expires April 23, 2007                [Page 24]