Traffic Engineering Working Group Wai Sum Lai Internet Draft AT&T Labs Document: Category: Informational Blaine Christian UUNET Richard W. Tibbs Oak City Networks & Solutions Steven Van den Berghe Ghent University/IMEC November 2001 A Framework for Internet Traffic Engineering Measurement Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract In this document, a measurement framework for supporting the traffic engineering of IP-based networks is presented. Uses of traffic measurement in service provider environments are described, and issues related to time scale and read-out period are discussed. Different measurement types are classified, with each being specified as a meaningful combination of a measurement entity and a measurement basis. Table of Contents Status of this Memo................................................1 1. Abstract........................................................1 2. Conventions used in this document...............................2 Lai, et al Category - Expiration [Page 1] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 3. Introduction....................................................2 4. Terminology.....................................................4 4.1 Route, path....................................................4 4.2 Throughput, traffic volume.....................................4 5. Uses of Traffic Measurement.....................................5 5.1 Traffic characterization.......................................5 5.2 Network monitoring.............................................5 5.3 Traffic control................................................6 6. Time Scales for Network Operations..............................6 7. Read-Out Periods................................................7 8. Measurement Bases...............................................8 8.1 Flow-based.....................................................8 8.2 Interface-based, link-based, node-based........................9 8.3 Node-pair-based................................................9 8.4 Path-based....................................................10 9. Measurement Entities...........................................10 9.1 Entities related to traffic and performance...................10 9.2 Entities related to establishment of connection or path.......13 10. Measurement Types.............................................13 10.1 Measurement types related to traffic or performance..........13 10.2 Measurement types related to resource usage..................14 11. Traffic Matrix Statistics.....................................14 12. Performance Monitoring........................................15 13. Security Considerations.......................................16 14. References....................................................16 15. Acknowledgments...............................................18 16. Author's Addresses............................................18 Full Copyright Statement..........................................18 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. 3. Introduction This document describes a framework for Internet traffic engineering measurement, with the objective of providing principles for the development of a set of measurement systems to support the traffic engineering of IP-based networks [1]. A major goal is to provide guidance for establishing protocol-independent and platform-neutral traffic measurement standards to achieve multi-vendor inter- operability. It is critical to minimize the possibilities of inconsistencies arising from, e.g., differing statistical definitions, overlapping data collection, processing at different protocol levels, and similar inconsistencies by different vendors or network operators. The need for a common framework, including detailed definitions for measurements, is motivated by the needs for consistency, precision, and effectiveness of the overall traffic engineering function. Traffic engineering includes measurements, forecasting, planning, Lai, et al Category - Expiration [Page 2] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 dimensioning, control, and performance monitoring. From this perspective, the purpose of this document is to set principles of measurement in place that assure the quality of the other aspects of traffic engineering. The scope of this document is limited to those aspects of measurement pertaining to intra-domain operations, i.e., within a given autonomous system. However, measurements on its boundary with other domains are included as well. The focus is primarily on traffic engineering in Internet service provider environments. In this document, uses of traffic measurement in traffic characterization, network monitoring, and traffic control are first described. Depending on the network operations to be performed in these tasks, three different time scales can be identified, ranging from months, through days or hours, to minutes or less. To support these operations, traffic measurement must be able to capture accurately, within a given confidence interval, the traffic variations and peaks without degrading network performance and without generating an immense amount of data. As one consequence of the need to avoid network performance degradation, specification of a suitable read-out period for each service class for traffic summarization is essential. Other principles such as concise representation of measurements are identified as well. Traffic measurement can be performed on the basis of flows, interfaces, links, nodes, node-pairs, or paths. Based on these objects, different measurement entities can be defined, such as traffic volume, average holding time, bandwidth availability, throughput, delay, delay variation, packet loss, and resource usage. Using these measured traffic data, in conjunction with other network data such as topological data and router configuration data, traffic matrix and other relevant statistics can be derived for traffic engineering purposes. Traffic measurement also plays a key role in network performance management. In addition to these capabilities, functions of a measurement system should also include data storage, data processing, statistics generation and reporting. However, these aspects are outside the scope of this document. As a framework, this document is mainly concerned with a discussion of various technical issues surrounding traffic measurement, particularly in the area of statistical traffic load estimation for traffic engineering purposes. As far as possible and to avoid duplication of effort, relevant work done in measurements by other standards organizations will be applied or adapted, and references to them will be made. These include, in particular, . IP Performance Metrics (IPPM) Working Group of the IETF: its framework document [2] and the associated documents on individual metrics [3, 4, 5, 6, 7, 8, 9] Lai, et al Category - Expiration [Page 3] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 . ITU-T: Recommendation I.380/Y.1540 [10] and Draft Recommendation Y.1541 [11] 4. Terminology The intent of this section is not to provide definition or description of terms used in this document. Rather, it is to highlight the difference in usage of closely related terms. 4.1 Route, path A route is any unidirectional sequence of nodes and links, for sending packets from a source node to a destination node. A path refers to an MPLS tunnel, i.e., a label-switched path [12]. It should be pointed out that there are also methods for creating paths with other technologies such as frame relay or ATM. The measurement described in this document may apply to these technologies with suitable adaptation. To simplify description, reference is made to MPLS only in what follows. 4.2 Throughput, traffic volume Both quantities can be applied to a network, a network segment, or an individual network element. Throughput of a network, as a measure of delivered performance, refers to the maximum sustainable rate of transferring packets successfully across the network, under given network conditions, e.g., a given traffic mix, while meeting quality of service (QoS) objectives. This usage is consistent with the definition of throughput for a network interconnect device as specified in [13]. For real-time network control, active measurement of throughput by probing may be used to determine the currently available capacity of a network to carry additional traffic. (In an active measurement, test packets are injected into the network. Data collected about these packets are taken as representative of the behavior of the network.) Traffic volume, as a measure of the traffic carried, characterizes the level of traffic that a network is designed to support. Passive, i.e., in-service non-intrusive, measurement of the traffic volume is usually used to estimate the long-term offered traffic for the purposes of network dimensioning in the capacity-management and network-planning processes (see the Section on Time Scales for Network Operations). A network should be properly dimensioned so that its throughput is adequate to handle the expected traffic volume. Throughput is expressed in terms of number of data units per time unit. Traffic volume is expressed in data units with reference to a read-out period (see the Section on Read-Out Periods). For transmission systems, the data unit is usually a multiple of either Lai, et al Category - Expiration [Page 4] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 bits or bytes. For processing systems, the data unit is usually a multiple of packets. 5. Uses of Traffic Measurement Traffic measurement is used to collect traffic data for the following purposes: . Traffic characterization . Network monitoring . Traffic control 5.1 Traffic characterization . Identifying traffic patterns, particularly traffic peak patterns, and their variations in statistical analysis; this includes developing traffic profiles to capture daily, weekly, or seasonal variations. . Determining traffic distributions in the network on the basis of flows, interfaces, links, nodes, node-pairs, paths, or destinations. . Estimation of the traffic load according to service classes in different routers and the network. . Observing trends for traffic growth and forecasting of traffic demands. For example, traffic engineering measurements are usually used to determine the statistical moments of a traffic flow. As suggested in [14], given the time series of packet arrivals, a suitable parametric stochastic model based on the mean and variance of the time series can be constructed. This traffic model is then used in the ensuing phases of traffic engineering, such as link dimensioning to meet service objectives. 5.2 Network monitoring . Determining the operational state of the network, including fault detection. . Monitoring the continuity and quality of network services, to ensure that QoS/GoS objectives are met for various classes of traffic, to verify the performance of delivered services, or to serve as a means of sectionalizing performance issues seen by a customer. [QoS reflects the performance perceivable by a user of a service, while GoS (grade of service) is used by a service provider for internal design and operation of a network.] . Evaluating the effectiveness of traffic engineering policies, or triggering certain policy-based actions (such as alarm generation, or path preemption) upon threshold crossing; this may be based on the use of performance history data. . Verifying peering agreements between service providers by monitoring/measuring the traffic flows over interconnecting links at border routers; this includes the estimation of inter- and intra-network traffic, as well as originating, terminating, and transit traffic that are being exchanged between peers. Lai, et al Category - Expiration [Page 5] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 An example of using traffic measurements in this area might be monitoring packet loss rates at various points in a network to detect apparent link failure. Another example is monitoring the QoS delivered to external peers by an autonomous system to ensure that peering agreements are met. 5.3 Traffic control . Adaptively optimizing network performance in response to network events, e.g., rerouting to work around congestion or failures. . Providing a feedback mechanism in the reverse flow messaging of RSVP-TE or CR-LDP signaling in MPLS to report on actual topology state information such as link bandwidth availability. . Support of measurement-based admission control, i.e., by predicting the future demands of the aggregate of existing flows so that admission decisions can be made on new flows. An example of traffic engineering measurements used to effect a traffic control mechanism is to configure policing mechanisms in response to traffic load and performance measurements. A network operator could selectively throttle low-priority flows to improve near-real-time performance of higher-priority flows, and maintain tighter QoS envelopes. Another example would be to use measurement results for feedback into IGP routing decisions, e.g., for adjusting the link weights based on them. 6. Time Scales for Network Operations The information collected by traffic measurement can be provided to the end user or application either in real time, or for record (i.e., data retention) in non-real time, depending on the activities to be performed and the network actions to be taken. Traffic control will generally require real-time information. For network planning and capacity management as described below, information may be provided in non-real time after the processing of raw data. Broadly speaking, the following three time scales can be classified, according to the use of observed traffic information for network operations [14]. Network planning Information that changes on the order of months is used to make traffic forecasts as a basis for network extensions and long-term network configuration. That is, for planning the topology of the network, planning alternative routes to survive failures or determining where capacity must be augmented in advance of projected traffic growth. Forecasting and planning may also lead to the introduction of new technology and architecture. Capacity management Information that changes on the order of days or hours is used to manage the deployed facilities, by taking appropriate maintenance or Lai, et al Category - Expiration [Page 6] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 engineering actions to optimize utilization. For example, new MPLS tunnels may be set up or existing tunnels modified while meeting service level agreements. Also, load balancing may be performed, or traffic may be rerouted for re-optimization after a failure. Real-time network control Information that changes on the order of minutes or less is used to adapt to the current network conditions in near real time. Thus, to combat localized congestion, traffic management actions may perform temporary rerouting to redistribute the load. Upon detecting a failure, traffic may be diverted to pre-established, secondary routes until more optimized routes can be arranged. 7. Read-Out Periods A measurement infrastructure must be able to scale with the size and the speed of a network as it evolves. Hence, it is important to minimize the amount of data to be collected, and to condense the collected data by periodic summarization. This is to prevent network performance from being adversely affected by the unnecessarily excessive loading of router control processors, router memories, transmission facilities, and the administrative support systems. A measurement interval is the time interval over which measurements are taken. Some traffic data must be collected continuously, while others by sampling, or on a scheduled basis. For example, peak loads and peak periods can be identified only by continuous measurement as traffic typically fluctuates irregularly during the whole day. If traffic variations are regular and predictable, it may be possible to measure the expected normal load on pre- determined portions of the day. This requires the definition of a busy period. Special studies on selected segments of the network may be conducted on a scheduled basis. Active measurement, with the involvement of network operator, may be activated manually. For instance, active throughput measurement may be used to identify alternate routes during periods of network congestion. A measurement interval consists of a sequence of consecutive read- out periods. Summarization is usually done by integrating the raw data over a pre-specified read-out period. The granularity of this period must be suitably chosen. It should be short enough to capture, with acceptable accuracy, the bursty nature of the traffic, i.e., the traffic variations and peaks. Since measurements represent a load for the router, the read-out period should not be so short that router performance is degraded while a voluminous quantity of data is produced. Also, read-out may be started when the measured data exceeds a preset threshold, or when the space allocated for temporarily holding the data in a router is exhausted. For a multi-service IP-based network, each service typically has its own traffic characteristics and performance objectives. To ensure that service-specific features are reflected in the measurement Lai, et al Category - Expiration [Page 7] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 process, different read-out periods may be needed for different classes of service. 8. Measurement Bases Measurements can be classified on the basis of where, and at which level the traffic data are gathered and aggregated. This is similar to the concept of a *population of interest* as specified in ITU-T Recommendation I.380/Y.1540. As defined therein, this refers to a set of packets, possibly relative to a particular pair of source and destination hosts, for the purposes of defining performance parameters. However, measurement bases as used here may not have any association with a source-destination pair. In this document, customer-based measurements are not considered. Service providers will make decisions on how to perform the measurements needed, and there are various tradeoffs involved. One option is to obtain the measurements directly from the network elements themselves, e.g., via SNMP (Simple Network Management Protocol). Collecting the measurements on the operational network elements such as routers is sometimes a performance concern. Currently, there are a number of third-party measurement/monitoring products available. Hence, another option is to deploy such equipment, which might have performance advantages but also introduces additional cost. Regardless of the type of measurement source, either a network element or a third-party product, measurements should be collected, as far as possible, by a measurement source without requiring coordination with other measurement sources. Thus, it is desirable to perform those measurements that do not require the use of specialized monitoring equipment connected to the network at multiple locations. While each measurement source may act autonomously with regard to taking measurements, a network operator may specify some network-wide policy regarding measurement scheduling. Such policy may be, say, the use of the same time of day, the same measurement interval, or measurement intervals that are multiples of each other (e.g., nested intervals with synchronized boundaries). A schedule therefore should include such time information as the start, the duration, and periodicity of a certain measurement. The following measurement bases are considered in this document: . Flow-based . Interface-based, link-based, node-based . Node-pair-based . Path-based 8.1 Flow-based This is conceptually similar to the call detail record (CDR) in circuit-switched telecommunications networks. It is primarily used on interfaces at access routers, edge routers, or aggregation Lai, et al Category - Expiration [Page 8] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 routers where traffic originates or terminates, rather than on backbone routers in the core network. Like CDR measurements, flow- based records are used to collect detailed information about a flow. This includes such information as source and destination IP addresses/port numbers, protocol, type of service, timestamps for the start and end of a flow, packet count, octet count, etc. As flow is a fine-grained object, measuring every flow that passes through all the edge devices may not be scalable or feasible. Hence, per-flow data are usually used in a special study conducted on a non-continuous schedule and on selected routers only. Sampling of flow-based measurements may also be needed to reduce both the amount of data collected and the associated overhead. 8.2 Interface-based, link-based, node-based Passive measurement can be taken at each network element. For example, SNMP uses passive monitoring to collect raw data on an interface at an edge or backbone router. These data are stored in MIBs (Management Information Bases) and include counts on packets and octets sent/received, packet discards, errored packets. While not intended for core network, RMON (Remote Network Monitoring) can possibly be used in the access link of an Internet service provider to provide managed Internet service to corporate LANs. To reduce the overhead in managing multiple links between the same ingress and egress points, there is proposal to aggregate links for network optimization [15]. Component links in such a *bundled link* will have same routing constraints, resource classes, and attributes. Multiple links are treated as a single IP link. Traffic measurements, such as bandwidth availability, throughput, should consider the measurements for bundled links. Also, such measurements should be protocol independent and media independent to ensure portability and commonality in the measurements. 8.3 Node-pair-based Active measurements by probing, as specified in the IPPM framework, can be conducted between each pair of major routing hubs for determining edge-to-edge performance of a core network. This complements the passive measurements of the previous sub-section, which provide local views of the performance of individual network elements. In telecommunications networks, each established call has an associated node-pair. By maintaining a set of node-pair data registers (usage, peg count, overflow, etc) in each switch, node- pair-based measurements for traffic statistics such as the load between a given node pair are taken directly. In contrast, in IP- based networks, currently such kind of node-pair-based measurements cannot be taken directly. However, it is possible to infer them from flow-based passive measurements and other network information. A problem with this approach is that flow-based measurement data are Lai, et al Category - Expiration [Page 9] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 voluminous. Also, another problem that must be accounted for is the routing changes among the multiple routes due to, e.g., a change in the configuration of intradomain routing, or a change in interdomain policies made by another autonomous system. This is further discussed in the Section on Traffic Matrix Statistics. 8.4 Path-based The ability of MPLS to use fixed preferred paths for routing traffic, so-called route pinning, gives the means to develop path- based measurements. This may enable the development of methodologies for such functions as admission control and performance verification of delivered service. Like a flow, a path is associated with a pair of nodes. However, path is a more coarse-grained object than flow, as paths are usually used to carry aggregated traffic. In addition, when routing changes occur, the amount of traffic to be carried by a path will either not be affected or be merged with that of another path. Because of these properties, path-based measurements are more scalable and may be used to provide more readily an accurate, network-wide, view of the traffic demands. For example, the traffic between a given pair of nodes may be inferred from the aggregate of the traffic carried by the all the paths either terminated by or passed through the same node-pair. 9. Measurement Entities A measurement entity defines what is measured: it is a quantity for which data collection must be performed with a certain measurement. A measurement type can be specified by a (meaningful) combination of a measurement entity with the measurement basis described in the previous section. 9.1 Entities related to traffic and performance Some of the measurement entities listed below, such as throughput, delay, delay variation, and packet loss, are related to the respective IPPM performance metrics or the I.380/Y.1540 performance parameters. . Traffic volume (mean and variance, in number of bits, bytes, or packets transferred, as counted over a given time interval), on a per service class basis, at various aggregation levels (IP address prefix, interface, link, node, node-pair, path, network edge, customer, or autonomous system) Note: (1) This is a measurement for the traffic carried by a network, a network segment, or an individual network element; it is used to derive the carried load or carried traffic intensity [16]. When measured during the busy period, this entity is normally used to estimate the traffic offered. However, the estimation procedure should take into account such factors as congestion, which may result in decreased carried traffic. In Lai, et al Category - Expiration [Page 10] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 addition, congestion may lead to user behavior such as reattempt or abandonment, which may affect the actual traffic offered. (2) To reduce uncertainty in traffic estimation, second-order measures may need to be developed. (3) Measurement of traffic volumes over interconnecting links at border routers can be used to estimate the traffic exchange between peers for contract verification. . Average holding time (e.g., flow duration or lifetime, duration of an MPLS path), on a per service class basis Note: (1) This is similar to call holding time in telecommunications networks. Peg count, usage, and call holding time are three busy-hour entities that should be independently measured for both call-dependent and load-dependent engineering. This is important especially when the call busy hour and the load busy hour during a day are non-coincident, due to the hour-to-hour variation of call holding times. (2) The holding time statistics of long-living static paths reflect the effect of network equipment failures, link outages, or scheduled maintenance, and hence may to used to derive information about up-time or service availability. . Available bandwidth of a link or path - useful for load balancing, measurement-based admission control to determine the feasibility of creating a new MPLS tunnel (real-time information can be used for dynamic establishment) . Throughput (in bits per second, bytes per second, or packets per second) Note: (1) This is a measure of the "goodput." That is, the rate at which a given amount of traffic excluding lost, misdelivered, or errored packets, that passes between a set of end points, where end points can be logically or physically defined. The condition of the network, e.g., normal or high load, under which the measurement is taken should be noted. (2) The protocol level at which a throughput measurement is taken must be specified, as the packet payload and packet overheads are protocol dependent. (3) The average packet size may be inferred from the bit rate and packet rate measurements. This quantity is useful to gauge router performance, since router operations are typically packet-oriented and small packets are more processing-intensive. . Delay (e.g., cross-router delay from node-based measurement may be used to measure queueing delay within a router; end-to-end one-way or round-trip packet delay can be obtained by node-pair-based measurement) Note: The condition of the network, e.g., normal or high load, under which the measurement is taken should be noted. This is useful to determine if delay objectives are met. . Delay variation Note: There are several methods to measure this quantity as specified in ITU-T and IPPM. (1) In Appendix II of I.380/Y.1540, IP packet delay variation is defined via four alternative methods. Lai, et al Category - Expiration [Page 11] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 The first two methods define an end-to-end two-point delay variation of a given packet, measured between two measurement points (such as ingress and egress), as the difference between the one-way delay of the given packet and some nominal delay. This nominal delay is chosen to be the first packet delay in the first method and the average delay of the population of packets in the second method. The third alternative, interval-based method, measures the percentage of packets with delay variations that fall outside some pre-specified delay variation interval. Finally, the quantile-based method measures the distance (in time units) between pre-selected quantiles, e.g., 99.5 percentile and 0.5 percentile, of the delay variation distribution. This method is tighter than the interval-based method since it bounds the tail of the delay variation distribution. In Y.1541, additional considerations and more alternatives of delay variations are described. (2) In IPPM [8], the concept of a selection function is introduced that allows for the explicit designation of selected packets whose one-way delay values are compared to compute one-way delay variation. For example, a selection function can be defined to select the consecutive packets within a specified interval, or to select the maximum and minimum one-way delays within a specified interval. . Packet loss Note: (1) While packet losses due to transmission and/or protocol errors may not be traffic related, unexpected excessive loss may be used as a means of fault detection. (2) Packet losses due to policing or network congestion should be distinguished. The former is a result of user violation of service contract and the network operator should not be penalized for it. The latter, whether intentional or unintentional, is caused by network conditions such as buffer overflow, router forwarding process busy, and may not be the user's fault. When policing is done by a network, measurement of non-conforming packets at the edge provides an indication on the extent to which the network is carrying this type of packets (which can potentially be dropped if network gets congested). Loss due to congestion of any packets, including loss of non-conforming packets, is a useful measure in traffic engineering to account for resource management. (3) Long- term averages can be measured by the I.380/Y.1540 IP packet loss ratio or by the IPPM Poisson sampling of one-way loss. However, during the convergence times associated with routing updating, the loss may be high enough as to cause service unavailability. This effect needs to be captured and statistics such as loss patterns, burst loss, or severe loss ratio may be useful. . Resource usage, such as link/router utilization, buffer occupancy (e.g., fraction of arriving packets finding the buffer above a given set of thresholds) Note: (1) Depending on the architecture of a router, router utilization measurements may include processor and memory (e.g., forwarding tables) utilization for each of the line cards and/or Lai, et al Category - Expiration [Page 12] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 the central unit. (2) Trigger points may be set when resource usage consistently exceeds a certain threshold. 9.2 Entities related to establishment of connection or path Where connection admission control is used, a measurement entity for monitoring network performance may be the proportion of connections denied admission. Also, it may be useful to score the requested bandwidth within the traffic parameters for the setup request. Corresponding to the number of call attempts (i.e., peg count) in telecommunications networks, the number of connection requests, the number of flows, etc., may be measured in given read-out periods to characterize the traffic. To characterize paths, the following measurement entities may be defined: path setup delay, path setup error probability, path setup denial (blocking) probability, path release delay, path disconnect probability, path restoration time. 10. Measurement Types A measurement matrix can be defined wherein each column represents a measurement basis and each row represents a measurement entity. An entry in this measurement matrix, corresponding to a meaningful and measurable combination of an entity and a basis, defines a particular measurement type. For each measurement type, there should be a set of measurement points specified to bound the network segment for the purposes of taking measurement. A measurement point may be the physical boundary between a node and an adjacent link, or the logical interface between two protocol layers in a protocol stack. 10.1 Measurement types related to traffic or performance The following measurement matrix illustrates some of the measurement types related to traffic or performance. Potentially, there can be one such matrix for each service class. Bases: Flow Interface, Node Pair Path Node Entities: (passive) (passive) (both) (both) Traffic Volume x(1) x x(3) x(3) Avg. Hold. Time x x(3) Avail. Bandwidth x x(3) Throughput x(4) x(4) Delay x(2) x(4) x(4) Delay Variation x(2) x(4) x(4) Packet Loss x x(5) x(5) Notes: (1) This measurement type can be used to derive flow size statistics. Lai, et al Category - Expiration [Page 13] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 (2) These are 1-point measurements. (3) As a starting point, statistics collected by passive measurement through the MPLS traffic engineering MIBs [17, 18, 19] may be used. (4) Active measurements based on IPPM metrics are currently in use for node-pairs; they may be developed for paths. (5) Besides active measurements based on IPPM, path loss may possibly be inferred from the difference between ingress and egress traffic statistics at the two endpoints of a path. However, such inference for the cumulative losses between a given node pair over multiple routes may be less useful, since different routes may have different loss characteristics. 10.2 Measurement types related to resource usage Another measurement matrix can be constructed for resource consumption. This leads to a set of measurement types comprising the different usage, one for each network resource object such as router (processor and memory), link, and buffer, by different classes of traffic: . control (e.g., routing control) traffic . signaling traffic . user traffic from different service classes Bases: Node Link Buffer Entities: Control Util. x x x Signaling Util. x x x Service Class Util. x x x The amount of control and signaling traffic carried by a network is a function of many factors. To name a few, they include the size and topology of the network, the control and signaling protocols used, the amount of user traffic carried, the number of failure events, etc. Also, flooding of link-state advertisement (LSA) messages in Interior Gateway Protocol (IGP, such as OSPF or IS-IS) may cause significant routing control traffic during events such as an LSA storm as a result of failures due to fiber cuts or failed power supply. The above utilization measurements for control and signaling traffic are intended to help develop guidelines for the proper dimensioning and apportionment of network resources so that a given level of user traffic can be adequately supported. As the primary focus here is on user traffic measurements, the additional needs and properties of control and signaling traffic measurements are beyond the scope of this document. 11. Traffic Matrix Statistics An important set of data for traffic engineering is point-to-point or point-to-multipoint demands. This data is needed in the provisioning of intradomain routes and external peering in the Lai, et al Category - Expiration [Page 14] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 existing network, as well as planning for the placement and sizing of new links, routers, or peers. In current practice, estimates for traffic demands are usually determined from a combination of traffic projections, customer prescriptions, and service level agreements. Under existing mode of operation, it is not easy to obtain network-wide traffic demands from the local interface measurements taken by different IP routers. As explained in [20, 21], information from diverse network measurements and various configuration files are needed to infer the traffic volume. Besides raw measurement data, additional information such as topological data and router configuration data are required to obtain a network view. Furthermore, destination- based routing/forwarding in IGP provides a network operator with primitive and limited control over the routing of traffic flows. This necessitates the association of a time sequence of forwarding tables from different routers to reconstruct the different routes used by the network over time. By using this auxiliary information, together with flow-based measurements, the above-cited references describe how to determine the traffic volume from an ingress link to a set of egress links by validating and joining various data sets together. Some shortcomings in today's method to derive traffic matrix statistics as above include the volume of data from flow-based measurement, the lack of sufficient routing control information, and the need to correlate data from a variety of sources. The routing control offered by MPLS can be used to avoid some of these deficiencies. To take advantage of this capability, path-based passive measurement should be developed. Furthermore, as explained in the Section on Path-based Measurement Bases, by aggregating the appropriate set of path-based traffic data, the corresponding node- pair-based traffic data can be obtained. This will facilitate the derivation of traffic matrix statistics, possibly on a per service class basis. Besides traffic engineering, a major application of MPLS is the support of network-based virtual private networks (VPNs). A VPN can be an enterprise network or a carrier's carrier network. Path-based measurement by a network operator on behalf of the VPN customers facilitates the estimation of the traffic offered by these VPNs. 12. Performance Monitoring General aspects of measurements required to support the operation, administration, and maintenance of a network are outside the scope of this document (see [22, 23, 24] for a discussion of MPLS OAM). The focus of the measurements here is only on operations related to traffic engineering and network performance management. A major component of performance management is performance monitoring, i.e., continuous real-time monitoring of the quality or health of the network and its various elements to ensure a Lai, et al Category - Expiration [Page 15] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 sustained, uninterrupted delivery of quality service. This requires the use of measurement, either passively or actively, to collect information about the operational state of the network and to track its performance. For a discussion of passive monitoring and the use of synthetic traffic sources in active probing, see [25]. Alarms may be generated when the state of a network element exceeds prescribed thresholds. Performance degradation can occur as a result of routing instability, congestion, or failure of network components. Periods of congestion may be detected when the resource usage of a network segment consistently exceeds a certain threshold, or when the cross- router delay is unexpectedly high. After the identification of a hot spot, active throughput measurement may be used to seek out alternate routes for congestion bypass. Unexpected excessive loss of packets or throughput drops may be used as a means of fault detection, and may result in restoration activities. Internet utilities such as ping and traceroute have been useful to help diagnose network problems and performance debugging. Utilities with similar functions would be essential for path-oriented operations like in MPLS. This would include the capability to list, at any time, (1) for a given path, all the nodes traversed by it, and (2) for a given node, all the paths originating from it, transiting through it, and/or terminating on it. A proposal for route tracing is described in [26]. 13. Security Considerations The principles and concepts related to Internet traffic measurement as discussed in this document do not by themselves affect the security of the Internet. However, it is assumed that any measurement systems that are developed or deployed by a service provider are responsible for providing sufficient data integrity and confidentiality. It is also assumed that a service provider will take proper precautions to ensure that access to its measurement systems and all associated data is secure. Methods to achieve these security considerations are not addressed in this document. 14. References 1 D.O. Awduche, A. Chiu, A. Elwalid, I. Widjaja, and X. Xiao, "Overview and Principles of Internet Traffic Engineering," Internet-Draft, Work in Progress, October 2001. 2 V. Paxson, G. Almes, J. Mahdavi, and M. Mathis, "Framework for IP Performance Metrics," RFC 2330, May 1998. 3 J. Mahdavi and V. Paxson, "IPPM Metrics for Measuring Connectivity," RFC 2678, September 1999. 4 G. Almes, S. Kalidindi, and M. Zekauskas, "A One-way Delay Metric for IPPM," RFC 2679, September 1999. 5 G. Almes, S. Kalidindi, and M. Zekauskas, "A One-way Packet Loss Metric for IPPM," RFC 2680, September 1999. Lai, et al Category - Expiration [Page 16] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 6 G. Almes, S. Kalidindi, and M. Zekauskas, "A Round-trip Delay Metric for IPPM," RFC 2681, September 1999. 7 M. Mathis and M. Allman, "A Framework for Defining Empirical Bulk Transfer Capacity Metrics," RFC 3148, July 2001. 8 C. Demichelis and P. Chimento, "IP Packet Delay Variation Metric for IPPM," Internet-Draft, Work in Progress, February 2001. 9 V. Raisanen and G. Grotefeld, "Network performance measurement for periodic streams," Internet-Draft, Work in Progress, January 2001. 10 ITU-T Recommendation I.380/Y.1540, "Internet Protocol Data Communication Service -- IP Packet Transfer and Availability Performance Parameters," February 1999. 11 ITU-T Draft Recommendation Y.1541, "Network Performance Objectives for IP-Based Services," October 2001. 12 E. Rosen, A. Viswanathan, and R. Callon, "Multiprotocol Label Switching Architecture," RFC 3031, January 2001. 13 S. Bradner (Editor), "Benchmarking Terminology for Network Interconnection Devices," RFC 1242, July 1991. 14 G. Ash, "Traffic Engineering & QoS Methods for IP-, ATM-, & TDM- Based Multiservice Networks," Internet-Draft, Work in Progress, October 2001. 15 K. Kompella, Y. Rekhter, and L. Berger, "Link Bundling in MPLS Traffic Engineering," Internet-Draft, Work in Progress, February 2001. 16 W.S. Lai, "Traffic Measurement for Dimensioning and Control of IP Networks," Internet Performance and Control of Network Systems II Conference, SPIE Proceedings, Vol. 4523, Denver, Colorado, 21-22 August 2001, pp. 359-367. 17 C. Srinivasan, A. Viswanathan, and T.D. Nadeau, "MPLS Label Switch Router Management Information Base Using SMIv2," Internet- Draft, Work in Progress, January 2001. 18 C. Srinivasan, A. Viswanathan, and T.D. Nadeau, "Multiprotocol Label Switching (MPLS) Traffic Engineering Management Information Base," Internet-Draft, Work in Progress, August 2001. 19 K. Kompella, " A Traffic Engineering MIB," Internet-Draft, Work in Progress, October 2001. 20 A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True, "Deriving Traffic Demands for Operational IP Networks: Methodology and Experience," Proc. ACM SIGCOMM 2000, Stockholm, Swedan. 21 A. Feldmann, A. Greenberg, C. Lund, N. Reingold, and J. Rexford, "NetScope: Traffic Engineering for IP Networks," IEEE Network, March/April 2000. 22 N. Harrison, P. Willis, S. Davari, E. Cuevas, B. Mack-Crane, E. Franze, H. Ohta, T. So, S. Goldfless, and F. Chen, "Requirements for OAM in MPLS Networks," Internet-Draft, Work in Progress, May 2001. 23 ITU-T Draft Recommendation Y.1710, "Requirements for OAM Functionality for MPLS Networks," May 2001. 24 ITU-T Draft Recommendation Y.1711, "OAM Mechanisms for MPLS Networks," May 2001. Lai, et al Category - Expiration [Page 17] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 25 R.G. Cole, R. Dietz, C. Kalbfleisch, and D. Romascanu, "A Framework for Synthetic Sources for Performance Monitoring," Internet-Draft, Work in Progress, May 2001. 26 R. Bonica, K. Kompella, and D. Meyer, "Tracing Requirements for Generic Tunnels," Internet-Draft, Work in Progress, February 2001. 15. Acknowledgments The support of Gerald Ash on this work and his comments are much appreciated. Also, thanks to the inputs from Robert Cole, Enrique Cuevas, Alfred Morton, Moshe Segal, and the Tequila project. 16. Author's Addresses Wai Sum Lai AT&T Labs Room D5-3D18 200 Laurel Avenue Middletown, NJ 07748, USA Phone: +1 732-420-3712 Email: wlai@att.com Blaine Christian UUNET Room D1-2-737 22001 Loudoun County Parkway Ashburn, VA 20147, USA Phone: +1 703-206-5600 Email: Blaine@uu.net Richard W. Tibbs Oak City Networks & Solutions P.O. Box 10292 Raleigh, NC 27605, USA Phone: +1 919-510-9551 Email: drtibbs@oakcitysolutions.com Steven Van den Berghe Ghent University/IMEC St. Pietersnieuwsstraat 41 B-9000 Ghent, Belgium Phone: ++32 9 267 35 86 E-mail: steven.vandenberghe@intec.rug.ac.be Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it Lai, et al Category - Expiration [Page 18] Internet-Draft Framework for Internet Traffic Measurement Nov 2001 or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Lai, et al Category - Expiration [Page 19]